[erlang-questions] How to debug "Kernel pid terminated"

David Mercer <>
Wed May 16 19:00:46 CEST 2012


As a follow-up question, since I had a problem again overnight where the
failover took over for the main, even though the main was still running: Are
Erlang distributed applications not intended to be run on multiple nodes on
the same host?

 

Anyone have any success doing this in production?  I can get it to work, it
just doesn't seem to work long-term.

 

I guess I don't often see any posts on this list about the built-in
distributed application functionality of Erlang/OTP.  Does anyone actually
use it, or am I behind the times and I should be using some sort of custom
system developed by the RabbitMQ folks or something?  Just wondering,
because it makes a really good demo when I show people; it just doesn't seem
to be working for me long-term.

 

Cheers,


DBM

 

From: David Mercer [mailto:] 
Sent: Tuesday, May 15, 2012 3:48 PM
To: 
Subject: How to debug "Kernel pid terminated"

 

I have a distributed application that I run on a couple of nodes.  I have
had various problems where one node spontaneously decides another node is
not available and starts up its own instance of the application, but this
one is a first for me: One of my failover nodes exited after printing the
following messages:

 

=ERROR REPORT==== 14-May-2012::19:43:24 ===

** Generic server dist_ac terminating 

** Last message in was {internal_restart_appl,cron}

** When Server state == {state,

                            [{appl,cron,

                                 {},

                                 5000,

                                 [,

 
{}],

                                 [{,true}]}],

                            [],[],

                            [],

                            [cron],

                            [],[],[],[],[]}

** Reason for termination == 

** {{case_clause,

        {'EXIT',

            {timeout,

                {gen_server,call,

                    [application_controller,which_applications]}}}},

    [{dist_ac,restart_appl,2,[{file,"dist_ac.erl"},{line,952}]},

     {dist_ac,handle_info,2,[{file,"dist_ac.erl"},{line,697}]},

     {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,597}]},

     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}

 

=ERROR REPORT==== 14-May-2012::19:43:24 ===

    server: clickon_backup_server

    error: enoent

    path: <<"\\\\ftp-corp2\\SFTP-MW\\70350\\Upload\\837
<file:///\\\ftp-corp2\SFTP-MW\70350\Upload\837> ">>

{error_logger,{{2012,5,14},{19,43,25}},std_info,[{application,kernel},{exite
d,shutdown},{type,permanent}]}

{"Kernel pid
terminated",application_controller,"{application_terminated,kernel,shutdown}
"}

Kernel pid terminated (application_controller)
({application_terminated,kernel,shutdown})

 

 

Abnormal termination

 

I am guessing this node () somehow lost contact with the
main node () on the same host.  I am not sure, however, why
this would cause the whole Erlang node to crash.  How would I go about
debugging this?  (1) What circumstances caused this node to lose contact
with the other node on the same host?  (2) What can I do to gracefully
handle this situation?

 

Here's my thought process so far, which doesn't really answer any of my
questions:

 

1.       The error message seems to point me to the case statement on line
952 of dist_ac.erl (restart_appl/2).  This is a call to start_appl/3, which
expects either {ok, _, _} or {error, _}, but not {'EXIT', .}, which is what
it received.

 

2.       Looking at start_appl/3, I doubt it is the keysearch which is
throwing the EXIT, so I'm going to assume that it is the call to
start_distributed/6.

 

3.       I can continue down this rabbit hole, but I'm not sure how it will
answer either of my questions.

 

Can someone who perhaps knows the workings of distributed applications
better than I please give me a few pointers?  Please advise.  Thank-you.

 

David Mercer

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120516/4e90fc94/attachment.html>


More information about the erlang-questions mailing list