<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:Cambria;
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
span.EmailStyle18
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:344284520;
mso-list-type:hybrid;
mso-list-template-ids:44584196 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level4
{mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level7
{mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>As a follow-up question, since I had a problem again overnight where the failover took over for the main, even though the main was still running: Are Erlang distributed applications not intended to be run on multiple nodes on the same host?<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Anyone have any success doing this <i>in production</i>? I can get it to work, it just doesn’t seem to work long-term.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>I guess I don’t often see any posts on this list about the built-in distributed application functionality of Erlang/OTP. Does anyone actually use it, or am I behind the times and I should be using some sort of custom system developed by the RabbitMQ folks or something? Just wondering, because it makes a really good demo when I show people; it just doesn’t seem to be working for me long-term.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Cheers,<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><br>DBM<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><div style='border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt'><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> David Mercer [mailto:dmercer@gmail.com] <br><b>Sent:</b> Tuesday, May 15, 2012 3:48 PM<br><b>To:</b> erlang-questions@erlang.org<br><b>Subject:</b> How to debug "Kernel pid terminated"<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I have a distributed application that I run on a couple of nodes. I have had various problems where one node spontaneously decides another node is not available and starts up its own instance of the application, but this one is a first for me: One of my failover nodes exited after printing the following messages:<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>=ERROR REPORT==== 14-May-2012::19:43:24 ===<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>** Generic server dist_ac terminating <o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>** Last message in was {internal_restart_appl,cron}<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>** When Server state == {state,<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> [{appl,cron,<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> {failover,cron_main@MWRD},<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> 5000,<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> [cron_main@MWRD,<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> {cron_failover@MWRD,cron_failover@merced}],<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> [{cron_failover@MWRD,true}]}],<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> [],[],<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> [cron_failover@MWRD],<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> [cron],<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> [],[],[],[],[]}<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>** Reason for termination == <o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>** {{case_clause,<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> {'EXIT',<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> {timeout,<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> {gen_server,call,<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> [application_controller,which_applications]}}}},<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> [{dist_ac,restart_appl,2,[{file,"dist_ac.erl"},{line,952}]},<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> {dist_ac,handle_info,2,[{file,"dist_ac.erl"},{line,697}]},<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,597}]},<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'><o:p> </o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>=ERROR REPORT==== 14-May-2012::19:43:24 ===<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> server: clickon_backup_server<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> error: enoent<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'> path: <<"<a href="file:///\\\ftp-corp2\SFTP-MW\70350\Upload\837">\\\\ftp-corp2\\SFTP-MW\\70350\\Upload\\837</a>">><o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>{error_logger,{{2012,5,14},{19,43,25}},std_info,[{application,kernel},{exited,shutdown},{type,permanent}]}<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>{"Kernel pid terminated",application_controller,"{application_terminated,kernel,shutdown}"}<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>Kernel pid terminated (application_controller) ({application_terminated,kernel,shutdown})<o:p></o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'><o:p> </o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'><o:p> </o:p></span></p><p class=MsoNormal style='margin-left:.5in;background:#F2F2F2'><span style='font-family:Consolas'>Abnormal termination<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I am guessing this node (cron_failover@MWRD) somehow lost contact with the main node (cron_main@MWRD) on the same host. I am not sure, however, why this would cause the whole Erlang node to crash. How would I go about debugging this? (1) What circumstances caused this node to lose contact with the other node on the same host? (2) What can I do to gracefully handle this situation?<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Here’s my thought process so far, which doesn’t really answer any of my questions:<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo2'><![if !supportLists]><span style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'> </span></span><![endif]>The error message seems to point me to the case statement on line 952 of <i>dist_ac.erl</i> (<i>restart_appl</i>/2). This is a call to <i>start_appl</i>/3, which expects either <span style='font-family:Consolas;background:#F2F2F2'>{ok, _, _}</span> or <span style='font-family:Consolas;background:#F2F2F2'>{error, _}</span>, but not <span style='font-family:Consolas;background:#F2F2F2'>{'EXIT', …}</span>, which is what it received.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo2'><![if !supportLists]><span style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'> </span></span><![endif]>Looking at <i>start_appl</i>/3, I doubt it is the <i>keysearch</i> which is throwing the EXIT, so I’m going to assume that it is the call to <i>start_distributed</i>/6.<o:p></o:p></p><p class=MsoListParagraph><o:p> </o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo2'><![if !supportLists]><span style='mso-list:Ignore'>3.<span style='font:7.0pt "Times New Roman"'> </span></span><![endif]>I can continue down this rabbit hole, but I’m not sure how it will answer either of my questions.<o:p></o:p></p><p class=MsoListParagraph><o:p> </o:p></p><p class=MsoNormal>Can someone who perhaps knows the workings of distributed applications better than I please give me a few pointers? Please advise. Thank-you.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>David Mercer<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p></div></div></body></html>