[erlang-questions] How to debug "Kernel pid terminated"

David Mercer dmercer@REDACTED
Wed May 16 19:00:46 CEST 2012

As a follow-up question, since I had a problem again overnight where the
failover took over for the main, even though the main was still running: Are
Erlang distributed applications not intended to be run on multiple nodes on
the same host?


Anyone have any success doing this in production?  I can get it to work, it
just doesn't seem to work long-term.


I guess I don't often see any posts on this list about the built-in
distributed application functionality of Erlang/OTP.  Does anyone actually
use it, or am I behind the times and I should be using some sort of custom
system developed by the RabbitMQ folks or something?  Just wondering,
because it makes a really good demo when I show people; it just doesn't seem
to be working for me long-term.





From: David Mercer [mailto:dmercer@REDACTED] 
Sent: Tuesday, May 15, 2012 3:48 PM
To: erlang-questions@REDACTED
Subject: How to debug "Kernel pid terminated"


I have a distributed application that I run on a couple of nodes.  I have
had various problems where one node spontaneously decides another node is
not available and starts up its own instance of the application, but this
one is a first for me: One of my failover nodes exited after printing the
following messages:


=ERROR REPORT==== 14-May-2012::19:43:24 ===

** Generic server dist_ac terminating 

** Last message in was {internal_restart_appl,cron}

** When Server state == {state,











** Reason for termination == 

** {{case_clause,










=ERROR REPORT==== 14-May-2012::19:43:24 ===

    server: clickon_backup_server

    error: enoent

    path: <<"\\\\ftp-corp2\\SFTP-MW\\70350\\Upload\\837
<file:///\\\ftp-corp2\SFTP-MW\70350\Upload\837> ">>


{"Kernel pid

Kernel pid terminated (application_controller)



Abnormal termination


I am guessing this node (cron_failover@REDACTED) somehow lost contact with the
main node (cron_main@REDACTED) on the same host.  I am not sure, however, why
this would cause the whole Erlang node to crash.  How would I go about
debugging this?  (1) What circumstances caused this node to lose contact
with the other node on the same host?  (2) What can I do to gracefully
handle this situation?


Here's my thought process so far, which doesn't really answer any of my


1.       The error message seems to point me to the case statement on line
952 of dist_ac.erl (restart_appl/2).  This is a call to start_appl/3, which
expects either {ok, _, _} or {error, _}, but not {'EXIT', .}, which is what
it received.


2.       Looking at start_appl/3, I doubt it is the keysearch which is
throwing the EXIT, so I'm going to assume that it is the call to


3.       I can continue down this rabbit hole, but I'm not sure how it will
answer either of my questions.


Can someone who perhaps knows the workings of distributed applications
better than I please give me a few pointers?  Please advise.  Thank-you.


David Mercer


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120516/4e90fc94/attachment.htm>

More information about the erlang-questions mailing list