[erlang-questions] Process surviving disconnect

Anchise Inzaghi <>
Thu Aug 18 19:36:37 CEST 2011


 By default a process writes to the shell which started it.

 Everything works as intended and expected if I change bob.erl

 DELETE:  io:format ("Alice died of ~p.~n", [Reason] ),
 REPLACE: io:format (user, "Alice died of ~p.~n", [Reason] ),

 This way, Bob writes to his own shell and all is well.

 On Thu, 18 Aug 2011 15:19:05 +0200, Vincenzo Maggio wrote:
> Mmm, sorry but I don't grasp it: shouldn't bob write 'Alice died of
> no connection' on bob's node, which is in fact well alive?
>
>> Date: Wed, 17 Aug 2011 21:25:37 -0600
>> From: 
>> To: 
>> CC: 
>> Subject: RE: [erlang-questions] Process surviving disconnect
>>
>>  Mystery solved: no erlang bug but a brain bug of mine.
>>
>>  When I cut the network connection, Bob tries to write "Alice died 
>> of
>>  noconnection" to Alice's console which obviously is unreachable. So 
>> loop
>>  crashEs and Bob dies.
>>
>>  On Wed, 17 Aug 2011 22:56:20 +0200, Vincenzo Maggio wrote:
>> > Absolutely, maybe it's us but I notice that you, as me, has tried 
>> all
>> > the possibilities and in fact in any case but on link 
>> disconnection
>> > bob (and we think is right, BTW!!!) remains alive and in good 
>> health.
>> > I hope someone shows up with a clever explanation, if not it's
>> > perfectly understable to file a bug.
>> >
>> >>  I disconnect Alice from the network -> Bob joins his ancestors.
>> > Ahahah, may it rest in peace with the ones he loved!!!
>> >
>> >
>> >> Date: Wed, 17 Aug 2011 14:49:29 -0600
>> >> From: 
>> >> To: 
>> >> CC: 
>> >> Subject: RE: [erlang-questions] Process surviving disconnect
>> >>
>> >>  OK, I changed the spawn_link to spawn and subsequent monitor as 
>> you
>> >>  suggest: exactely the same behaviour.
>> >>
>> >>  Alice terminates gracefully -> Bob gets informed and is alive.
>> >>  Alice terminates ungracefully -> Bob gets informed and is alive.
>> >>  I kill Alice's node -> Bob gets informed and is alive.
>> >>  I disconnect Alice from the network -> Bob joins his ancestors.
>> >>
>> >>  So I will wait some days more here on erlang-questions and if
>> >> nothing
>> >>  shows up here, I will file a bug and hopefully will be proven
>> >> wrong.
>> >>
>> >>  On Wed, 17 Aug 2011 22:23:06 +0200, Vincenzo Maggio wrote:
>> >> > Use monitor(process, PID_TO_BE_MONITORED) after spawn.
>> >> > BTW not having two PCs I simply tested your code killing alice 
>> and
>> >> > guess what? bob survives! :)
>> >> >
>> >> >> Date: Wed, 17 Aug 2011 14:13:37 -0600
>> >> >> From: 
>> >> >> To: 
>> >> >> CC: 
>> >> >> Subject: RE: [erlang-questions] Process surviving disconnect
>> >> >>
>> >> >>  Thank you for your quick answer.
>> >> >>
>> >> >> > BTW, before filing a bug, could you please substitute
>> >> spawn_link
>> >> >> with
>> >> >> > spawn_monitor and remove the process_flag lines? It would be
>> >> >> > interesting to understand if either bob dies on its own or 
>> it's
>> >> >> > killed
>> >> >> > by no more being able to communicate with alice.
>> >> >>
>> >> >>  I am not sure how to replace spawn_link with spawn_monitor, 
>> as
>> >> >> neither
>> >> >>  spawn_monitor/1 nor spawn_monitor/3 take a node parameter.
>> >> >>  How do I do that or how else can I get some more detailed
>> >> >> information
>> >> >>  about Bob's sudden passing the Styx.
>> >> >>
>> >> >>
>> >> >>
>> >> >>  On Wed, 17 Aug 2011 21:48:30 +0200, Vincenzo Maggio wrote:
>> >> >> > Hello,
>> >> >> > I think something is wrong here.
>> >> >> >>  Bob die of noconnection.
>> >> >> >
>> >> >> > This is printed by
>> >> >> >>  		{'EXIT', Bob, Reason} ->
>> >> >> >>  			io:format ("Bob died of ~p.~n", [Reason] ),
>> >> >> >
>> >> >> > So alice is in fact receiving bob last death cry :D and
>> >> >> process_flag
>> >> >> > translate it in a message instead of transmitting exit 
>> signal
>> >> to
>> >> >> > alice; I think this is ok from the point of view of Alice, 
>> so
>> >> the
>> >> >> > real
>> >> >> > problem is that bob is dying (I know it's mundane, but I
>> >> learned
>> >> >> not
>> >> >> > to make assumption).
>> >> >> > Mmm, well I don't know if having no more a connection 
>> between
>> >> the
>> >> >> > process makes Erlang VM do some assumption of a virtual 
>> master
>> >> >> node.
>> >> >> > Well, if you want my opinion, I think that you should file a
>> >> bug
>> >> >> on
>> >> >> > the Erlang bugs mailing list if no one comes up with a 
>> proper
>> >> >> > explanation.
>> >> >> > Even if what we're thinking is wrong and this is not a bug, 
>> in
>> >> the
>> >> >> > past I had a problem on node lookup and they resolved it.
>> >> >> > These are my two cents, but if you can please let me know if
>> >> there
>> >> >> > are further updates 'cause it's a really interesting 
>> problem.
>> >> >> >
>> >> >> > BTW, before filing a bug, could you please substitute
>> >> spawn_link
>> >> >> with
>> >> >> > spawn_monitor and remove the process_flag lines? It would be
>> >> >> > interesting to understand if either bob dies on its own or 
>> it's
>> >> >> > killed
>> >> >> > by no more being able to communicate with alice.
>> >> >> >
>> >> >> > Vincenzo
>> >> >> >
>> >> >> >> Date: Wed, 17 Aug 2011 10:57:35 -0600
>> >> >> >> From: 
>> >> >> >> To: 
>> >> >> >> CC: 
>> >> >> >> Subject: RE: [erlang-questions] Process surviving 
>> disconnect
>> >> >> >>
>> >> >> >>  Thank you very much Vincenzo. You affirmed my assertion 
>> that
>> >> Bob
>> >> >> >> should
>> >> >> >>  survive the disconnect.
>> >> >> >>  Nevertheless he dies.
>> >> >> >>  I will point out exactly what I do and maybe someone can 
>> spot
>> >> >> the
>> >> >> >> error
>> >> >> >>  in my code, my setup or my thinking and tell me what I am
>> >> doing
>> >> >> >> wrong.
>> >> >> >>
>> >> >> >>  I start a node on gca.local:
>> >> >> >>  :~$ erl -name '' -setcookie 123
>> >> >> >>
>> >> >> >>  I start a node on usa.local:
>> >> >> >>  :~$ erl -name '' -setcookie 123
>> >> >> >>
>> >> >> >>  I start sasl on :
>> >> >> >>  ()1> application:start (sasl).
>> >> >> >>
>> >> >> >>  I run alice:start/0 on :
>> >> >> >>  ()1> alice:start ().
>> >> >> >>  true
>> >> >> >>
>> >> >> >>  I look for bob on  and save its pid:
>> >> >> >>  ()2> whereis (bob).
>> >> >> >>  <0.65.0>
>> >> >> >>  ()3> Pid = whereis (bob).
>> >> >> >>  <0.65.0>
>> >> >> >>
>> >> >> >>  I cut the network cable and wait a minute for the timeout.
>> >> >> >>
>> >> >> >>  On  I get the following output:
>> >> >> >>  =ERROR REPORT==== 17-Aug-2011::10:53:21 ===
>> >> >> >>  ** Node '' not responding **
>> >> >> >>  ** Removing (timedout) connection **
>> >> >> >>  Bob die of noconnection.
>> >> >> >>
>> >> >> >>  Nice, Alice trapped Bob's death and reported it. I check 
>> for
>> >> >> Alice:
>> >> >> >>  ()2> whereis (alice).
>> >> >> >>  <0.42.0>
>> >> >> >>
>> >> >> >>  Alice is up and running.
>> >> >> >>
>> >> >> >>  On  I get the following output:
>> >> >> >>  =ERROR REPORT==== 17-Aug-2011::10:53:10 ===
>> >> >> >>  ** Node '' not responding **
>> >> >> >>  ** Removing (timedout) connection **
>> >> >> >>
>> >> >> >>  But Bob is dead:
>> >> >> >>  ()4> whereis (bob).
>> >> >> >>  undefined
>> >> >> >>  ()5> is_process_alive (Pid).
>> >> >> >>  false
>> >> >> >>
>> >> >> >>  I really do not understand what is happening.
>> >> >> >>
>> >> >> >>
>> >> >> >>  Thank you in advance
>> >> >> >>
>> >> >> >>  Anchise
>> >> >> >>
>> >> >> >>  Here goes the code I used:
>> >> >> >>
>> >> >> >>  -module (alice).
>> >> >> >>  -compile (export_all).
>> >> >> >>
>> >> >> >>  start () -> register (alice, spawn (fun init/0) ).
>> >> >> >>
>> >> >> >>  stop () -> whereis (alice) ! stop.
>> >> >> >>
>> >> >> >>  init () ->
>> >> >> >>  	process_flag (trap_exit, true),
>> >> >> >>  	Bob = spawn_link ('', bob, start, [self () ] 
>> ),
>> >> >> >>  	loop (Bob).
>> >> >> >>
>> >> >> >>  loop (Bob) ->
>> >> >> >>  	receive
>> >> >> >>  		stop -> ok;
>> >> >> >>  		{'EXIT', Bob, Reason} ->
>> >> >> >>  			io:format ("Bob died of ~p.~n", [Reason] ),
>> >> >> >>  			loop (Bob);
>> >> >> >>  		Msg ->
>> >> >> >>  			io:format ("Alice received ~p.~n", [Msg] ),
>> >> >> >>  			loop (Bob)
>> >> >> >>  	end.
>> >> >> >>
>> >> >> >>
>> >> >> >>  -module (bob).
>> >> >> >>  -compile (export_all).
>> >> >> >>
>> >> >> >>  start (Alice) ->
>> >> >> >>  	process_flag (trap_exit, true),
>> >> >> >>  	register (bob, self () ),
>> >> >> >>  	loop (Alice).
>> >> >> >>
>> >> >> >>  loop (Alice) ->
>> >> >> >>  	receive
>> >> >> >>  		stop -> ok;
>> >> >> >>  		{'EXIT', Alice, Reason} ->
>> >> >> >>  			io:format ("Alice died of ~p.~n", [Reason] ),
>> >> >> >>  			loop (Alice);
>> >> >> >>  		Msg ->
>> >> >> >>  			io:format ("Bob received ~p.~n", [Msg] ),
>> >> >> >>  			loop (Alice)
>> >> >> >>  	end.
>> >> >> >>
>> >> >> >>
>> >> >> >>  On Wed, 17 Aug 2011 12:47:02 +0200, Vincenzo Maggio wrote:
>> >> >> >> > Hello,
>> >> >> >> > without further info a debug is rather difficult.
>> >> >> >> > But let's try to at least start analysis of the problem:
>> >> >> >> >
>> >> >> >> >>   - Has this something to do that I initially spawn Bob
>> >> from
>> >> >> the
>> >> >> >> >> Alice
>> >> >> >> >>  node?
>> >> >> >> >
>> >> >> >> > Absolutely not: this would hit the very foundation of
>> >> Erlang,
>> >> >> >> process
>> >> >> >> > referential transparency. When a process is started is a
>> >> brand
>> >> >> >> new,
>> >> >> >> > clean entity (indeed, default process heap space is 
>> always
>> >> the
>> >> >> >> same
>> >> >> >> > size!).
>> >> >> >> >
>> >> >> >> >>   - How can I make Bob to survive a connection loss?
>> >> >> >> >
>> >> >> >> > Look above: it SHOULD survive.
>> >> >> >> >
>> >> >> >> > Can you please start SASL (application:start(sasl) from 
>> the
>> >> >> shell)
>> >> >> >> > and see if shell log puts some further information?
>> >> >> >> >
>> >> >> >> > Vincenzo
>> >> >> >>
>> >> >>
>> >>
>>




More information about the erlang-questions mailing list