[erlang-questions] Process surviving disconnect

Anchise Inzaghi <>
Thu Aug 18 05:25:37 CEST 2011


 Mystery solved: no erlang bug but a brain bug of mine.

 When I cut the network connection, Bob tries to write "Alice died of 
 noconnection" to Alice's console which obviously is unreachable. So loop 
 crashEs and Bob dies.

 On Wed, 17 Aug 2011 22:56:20 +0200, Vincenzo Maggio wrote:
> Absolutely, maybe it's us but I notice that you, as me, has tried all
> the possibilities and in fact in any case but on link disconnection
> bob (and we think is right, BTW!!!) remains alive and in good health.
> I hope someone shows up with a clever explanation, if not it's
> perfectly understable to file a bug.
>
>>  I disconnect Alice from the network -> Bob joins his ancestors.
> Ahahah, may it rest in peace with the ones he loved!!!
>
>
>> Date: Wed, 17 Aug 2011 14:49:29 -0600
>> From: 
>> To: 
>> CC: 
>> Subject: RE: [erlang-questions] Process surviving disconnect
>>
>>  OK, I changed the spawn_link to spawn and subsequent monitor as you
>>  suggest: exactely the same behaviour.
>>
>>  Alice terminates gracefully -> Bob gets informed and is alive.
>>  Alice terminates ungracefully -> Bob gets informed and is alive.
>>  I kill Alice's node -> Bob gets informed and is alive.
>>  I disconnect Alice from the network -> Bob joins his ancestors.
>>
>>  So I will wait some days more here on erlang-questions and if 
>> nothing
>>  shows up here, I will file a bug and hopefully will be proven 
>> wrong.
>>
>>  On Wed, 17 Aug 2011 22:23:06 +0200, Vincenzo Maggio wrote:
>> > Use monitor(process, PID_TO_BE_MONITORED) after spawn.
>> > BTW not having two PCs I simply tested your code killing alice and
>> > guess what? bob survives! :)
>> >
>> >> Date: Wed, 17 Aug 2011 14:13:37 -0600
>> >> From: 
>> >> To: 
>> >> CC: 
>> >> Subject: RE: [erlang-questions] Process surviving disconnect
>> >>
>> >>  Thank you for your quick answer.
>> >>
>> >> > BTW, before filing a bug, could you please substitute 
>> spawn_link
>> >> with
>> >> > spawn_monitor and remove the process_flag lines? It would be
>> >> > interesting to understand if either bob dies on its own or it's
>> >> > killed
>> >> > by no more being able to communicate with alice.
>> >>
>> >>  I am not sure how to replace spawn_link with spawn_monitor, as
>> >> neither
>> >>  spawn_monitor/1 nor spawn_monitor/3 take a node parameter.
>> >>  How do I do that or how else can I get some more detailed
>> >> information
>> >>  about Bob's sudden passing the Styx.
>> >>
>> >>
>> >>
>> >>  On Wed, 17 Aug 2011 21:48:30 +0200, Vincenzo Maggio wrote:
>> >> > Hello,
>> >> > I think something is wrong here.
>> >> >>  Bob die of noconnection.
>> >> >
>> >> > This is printed by
>> >> >>  		{'EXIT', Bob, Reason} ->
>> >> >>  			io:format ("Bob died of ~p.~n", [Reason] ),
>> >> >
>> >> > So alice is in fact receiving bob last death cry :D and
>> >> process_flag
>> >> > translate it in a message instead of transmitting exit signal 
>> to
>> >> > alice; I think this is ok from the point of view of Alice, so 
>> the
>> >> > real
>> >> > problem is that bob is dying (I know it's mundane, but I 
>> learned
>> >> not
>> >> > to make assumption).
>> >> > Mmm, well I don't know if having no more a connection between 
>> the
>> >> > process makes Erlang VM do some assumption of a virtual master
>> >> node.
>> >> > Well, if you want my opinion, I think that you should file a 
>> bug
>> >> on
>> >> > the Erlang bugs mailing list if no one comes up with a proper
>> >> > explanation.
>> >> > Even if what we're thinking is wrong and this is not a bug, in 
>> the
>> >> > past I had a problem on node lookup and they resolved it.
>> >> > These are my two cents, but if you can please let me know if 
>> there
>> >> > are further updates 'cause it's a really interesting problem.
>> >> >
>> >> > BTW, before filing a bug, could you please substitute 
>> spawn_link
>> >> with
>> >> > spawn_monitor and remove the process_flag lines? It would be
>> >> > interesting to understand if either bob dies on its own or it's
>> >> > killed
>> >> > by no more being able to communicate with alice.
>> >> >
>> >> > Vincenzo
>> >> >
>> >> >> Date: Wed, 17 Aug 2011 10:57:35 -0600
>> >> >> From: 
>> >> >> To: 
>> >> >> CC: 
>> >> >> Subject: RE: [erlang-questions] Process surviving disconnect
>> >> >>
>> >> >>  Thank you very much Vincenzo. You affirmed my assertion that 
>> Bob
>> >> >> should
>> >> >>  survive the disconnect.
>> >> >>  Nevertheless he dies.
>> >> >>  I will point out exactly what I do and maybe someone can spot
>> >> the
>> >> >> error
>> >> >>  in my code, my setup or my thinking and tell me what I am 
>> doing
>> >> >> wrong.
>> >> >>
>> >> >>  I start a node on gca.local:
>> >> >>  :~$ erl -name '' -setcookie 123
>> >> >>
>> >> >>  I start a node on usa.local:
>> >> >>  :~$ erl -name '' -setcookie 123
>> >> >>
>> >> >>  I start sasl on :
>> >> >>  ()1> application:start (sasl).
>> >> >>
>> >> >>  I run alice:start/0 on :
>> >> >>  ()1> alice:start ().
>> >> >>  true
>> >> >>
>> >> >>  I look for bob on  and save its pid:
>> >> >>  ()2> whereis (bob).
>> >> >>  <0.65.0>
>> >> >>  ()3> Pid = whereis (bob).
>> >> >>  <0.65.0>
>> >> >>
>> >> >>  I cut the network cable and wait a minute for the timeout.
>> >> >>
>> >> >>  On  I get the following output:
>> >> >>  =ERROR REPORT==== 17-Aug-2011::10:53:21 ===
>> >> >>  ** Node '' not responding **
>> >> >>  ** Removing (timedout) connection **
>> >> >>  Bob die of noconnection.
>> >> >>
>> >> >>  Nice, Alice trapped Bob's death and reported it. I check for
>> >> Alice:
>> >> >>  ()2> whereis (alice).
>> >> >>  <0.42.0>
>> >> >>
>> >> >>  Alice is up and running.
>> >> >>
>> >> >>  On  I get the following output:
>> >> >>  =ERROR REPORT==== 17-Aug-2011::10:53:10 ===
>> >> >>  ** Node '' not responding **
>> >> >>  ** Removing (timedout) connection **
>> >> >>
>> >> >>  But Bob is dead:
>> >> >>  ()4> whereis (bob).
>> >> >>  undefined
>> >> >>  ()5> is_process_alive (Pid).
>> >> >>  false
>> >> >>
>> >> >>  I really do not understand what is happening.
>> >> >>
>> >> >>
>> >> >>  Thank you in advance
>> >> >>
>> >> >>  Anchise
>> >> >>
>> >> >>  Here goes the code I used:
>> >> >>
>> >> >>  -module (alice).
>> >> >>  -compile (export_all).
>> >> >>
>> >> >>  start () -> register (alice, spawn (fun init/0) ).
>> >> >>
>> >> >>  stop () -> whereis (alice) ! stop.
>> >> >>
>> >> >>  init () ->
>> >> >>  	process_flag (trap_exit, true),
>> >> >>  	Bob = spawn_link ('', bob, start, [self () ] ),
>> >> >>  	loop (Bob).
>> >> >>
>> >> >>  loop (Bob) ->
>> >> >>  	receive
>> >> >>  		stop -> ok;
>> >> >>  		{'EXIT', Bob, Reason} ->
>> >> >>  			io:format ("Bob died of ~p.~n", [Reason] ),
>> >> >>  			loop (Bob);
>> >> >>  		Msg ->
>> >> >>  			io:format ("Alice received ~p.~n", [Msg] ),
>> >> >>  			loop (Bob)
>> >> >>  	end.
>> >> >>
>> >> >>
>> >> >>  -module (bob).
>> >> >>  -compile (export_all).
>> >> >>
>> >> >>  start (Alice) ->
>> >> >>  	process_flag (trap_exit, true),
>> >> >>  	register (bob, self () ),
>> >> >>  	loop (Alice).
>> >> >>
>> >> >>  loop (Alice) ->
>> >> >>  	receive
>> >> >>  		stop -> ok;
>> >> >>  		{'EXIT', Alice, Reason} ->
>> >> >>  			io:format ("Alice died of ~p.~n", [Reason] ),
>> >> >>  			loop (Alice);
>> >> >>  		Msg ->
>> >> >>  			io:format ("Bob received ~p.~n", [Msg] ),
>> >> >>  			loop (Alice)
>> >> >>  	end.
>> >> >>
>> >> >>
>> >> >>  On Wed, 17 Aug 2011 12:47:02 +0200, Vincenzo Maggio wrote:
>> >> >> > Hello,
>> >> >> > without further info a debug is rather difficult.
>> >> >> > But let's try to at least start analysis of the problem:
>> >> >> >
>> >> >> >>   - Has this something to do that I initially spawn Bob 
>> from
>> >> the
>> >> >> >> Alice
>> >> >> >>  node?
>> >> >> >
>> >> >> > Absolutely not: this would hit the very foundation of 
>> Erlang,
>> >> >> process
>> >> >> > referential transparency. When a process is started is a 
>> brand
>> >> >> new,
>> >> >> > clean entity (indeed, default process heap space is always 
>> the
>> >> >> same
>> >> >> > size!).
>> >> >> >
>> >> >> >>   - How can I make Bob to survive a connection loss?
>> >> >> >
>> >> >> > Look above: it SHOULD survive.
>> >> >> >
>> >> >> > Can you please start SASL (application:start(sasl) from the
>> >> shell)
>> >> >> > and see if shell log puts some further information?
>> >> >> >
>> >> >> > Vincenzo
>> >> >>
>> >>
>>




More information about the erlang-questions mailing list