[erlang-questions] Process surviving disconnect

Wed Aug 17 22:49:29 CEST 2011

 OK, I changed the spawn_link to spawn and subsequent monitor as you 
 suggest: exactely the same behaviour.

 Alice terminates gracefully -> Bob gets informed and is alive.
 Alice terminates ungracefully -> Bob gets informed and is alive.
 I kill Alice's node -> Bob gets informed and is alive.
 I disconnect Alice from the network -> Bob joins his ancestors.

 So I will wait some days more here on erlang-questions and if nothing 
 shows up here, I will file a bug and hopefully will be proven wrong.

 On Wed, 17 Aug 2011 22:23:06 +0200, Vincenzo Maggio wrote:
> Use monitor(process, PID_TO_BE_MONITORED) after spawn.
> BTW not having two PCs I simply tested your code killing alice and
> guess what? bob survives! :)
>
>> Date: Wed, 17 Aug 2011 14:13:37 -0600
>> From: erlang@REDACTED
>> To: maggio.vincenzo@REDACTED
>> CC: erlang-questions@REDACTED
>> Subject: RE: [erlang-questions] Process surviving disconnect
>>
>>  Thank you for your quick answer.
>>
>> > BTW, before filing a bug, could you please substitute spawn_link 
>> with
>> > spawn_monitor and remove the process_flag lines? It would be
>> > interesting to understand if either bob dies on its own or it's
>> > killed
>> > by no more being able to communicate with alice.
>>
>>  I am not sure how to replace spawn_link with spawn_monitor, as 
>> neither
>>  spawn_monitor/1 nor spawn_monitor/3 take a node parameter.
>>  How do I do that or how else can I get some more detailed 
>> information
>>  about Bob's sudden passing the Styx.
>>
>>
>>
>>  On Wed, 17 Aug 2011 21:48:30 +0200, Vincenzo Maggio wrote:
>> > Hello,
>> > I think something is wrong here.
>> >>  Bob die of noconnection.
>> >
>> > This is printed by
>> >>  		{'EXIT', Bob, Reason} ->
>> >>  			io:format ("Bob died of ~p.~n", [Reason] ),
>> >
>> > So alice is in fact receiving bob last death cry :D and 
>> process_flag
>> > translate it in a message instead of transmitting exit signal to
>> > alice; I think this is ok from the point of view of Alice, so the
>> > real
>> > problem is that bob is dying (I know it's mundane, but I learned 
>> not
>> > to make assumption).
>> > Mmm, well I don't know if having no more a connection between the
>> > process makes Erlang VM do some assumption of a virtual master 
>> node.
>> > Well, if you want my opinion, I think that you should file a bug 
>> on
>> > the Erlang bugs mailing list if no one comes up with a proper
>> > explanation.
>> > Even if what we're thinking is wrong and this is not a bug, in the
>> > past I had a problem on node lookup and they resolved it.
>> > These are my two cents, but if you can please let me know if there
>> > are further updates 'cause it's a really interesting problem.
>> >
>> > BTW, before filing a bug, could you please substitute spawn_link 
>> with
>> > spawn_monitor and remove the process_flag lines? It would be
>> > interesting to understand if either bob dies on its own or it's
>> > killed
>> > by no more being able to communicate with alice.
>> >
>> > Vincenzo
>> >
>> >> Date: Wed, 17 Aug 2011 10:57:35 -0600
>> >> From: erlang@REDACTED
>> >> To: maggio.vincenzo@REDACTED
>> >> CC: erlang-questions@REDACTED
>> >> Subject: RE: [erlang-questions] Process surviving disconnect
>> >>
>> >>  Thank you very much Vincenzo. You affirmed my assertion that Bob
>> >> should
>> >>  survive the disconnect.
>> >>  Nevertheless he dies.
>> >>  I will point out exactly what I do and maybe someone can spot 
>> the
>> >> error
>> >>  in my code, my setup or my thinking and tell me what I am doing
>> >> wrong.
>> >>
>> >>  I start a node on gca.local:
>> >>  unroot@REDACTED:~$ erl -name 'bob@REDACTED' -setcookie 123
>> >>
>> >>  I start a node on usa.local:
>> >>  unroot@REDACTED:~$ erl -name 'alice@REDACTED' -setcookie 123
>> >>
>> >>  I start sasl on bob@REDACTED:
>> >>  (bob@REDACTED)1> application:start (sasl).
>> >>
>> >>  I run alice:start/0 on alice@REDACTED:
>> >>  (alice@REDACTED)1> alice:start ().
>> >>  true
>> >>
>> >>  I look for bob on bob@REDACTED and save its pid:
>> >>  (bob@REDACTED)2> whereis (bob).
>> >>  <0.65.0>
>> >>  (bob@REDACTED)3> Pid = whereis (bob).
>> >>  <0.65.0>
>> >>
>> >>  I cut the network cable and wait a minute for the timeout.
>> >>
>> >>  On alice@REDACTED I get the following output:
>> >>  =ERROR REPORT==== 17-Aug-2011::10:53:21 ===
>> >>  ** Node 'bob@REDACTED' not responding **
>> >>  ** Removing (timedout) connection **
>> >>  Bob die of noconnection.
>> >>
>> >>  Nice, Alice trapped Bob's death and reported it. I check for 
>> Alice:
>> >>  (alice@REDACTED)2> whereis (alice).
>> >>  <0.42.0>
>> >>
>> >>  Alice is up and running.
>> >>
>> >>  On bob@REDACTED I get the following output:
>> >>  =ERROR REPORT==== 17-Aug-2011::10:53:10 ===
>> >>  ** Node 'alice@REDACTED' not responding **
>> >>  ** Removing (timedout) connection **
>> >>
>> >>  But Bob is dead:
>> >>  (bob@REDACTED)4> whereis (bob).
>> >>  undefined
>> >>  (bob@REDACTED)5> is_process_alive (Pid).
>> >>  false
>> >>
>> >>  I really do not understand what is happening.
>> >>
>> >>
>> >>  Thank you in advance
>> >>
>> >>  Anchise
>> >>
>> >>  Here goes the code I used:
>> >>
>> >>  -module (alice).
>> >>  -compile (export_all).
>> >>
>> >>  start () -> register (alice, spawn (fun init/0) ).
>> >>
>> >>  stop () -> whereis (alice) ! stop.
>> >>
>> >>  init () ->
>> >>  	process_flag (trap_exit, true),
>> >>  	Bob = spawn_link ('bob@REDACTED', bob, start, [self () ] ),
>> >>  	loop (Bob).
>> >>
>> >>  loop (Bob) ->
>> >>  	receive
>> >>  		stop -> ok;
>> >>  		{'EXIT', Bob, Reason} ->
>> >>  			io:format ("Bob died of ~p.~n", [Reason] ),
>> >>  			loop (Bob);
>> >>  		Msg ->
>> >>  			io:format ("Alice received ~p.~n", [Msg] ),
>> >>  			loop (Bob)
>> >>  	end.
>> >>
>> >>
>> >>  -module (bob).
>> >>  -compile (export_all).
>> >>
>> >>  start (Alice) ->
>> >>  	process_flag (trap_exit, true),
>> >>  	register (bob, self () ),
>> >>  	loop (Alice).
>> >>
>> >>  loop (Alice) ->
>> >>  	receive
>> >>  		stop -> ok;
>> >>  		{'EXIT', Alice, Reason} ->
>> >>  			io:format ("Alice died of ~p.~n", [Reason] ),
>> >>  			loop (Alice);
>> >>  		Msg ->
>> >>  			io:format ("Bob received ~p.~n", [Msg] ),
>> >>  			loop (Alice)
>> >>  	end.
>> >>
>> >>
>> >>  On Wed, 17 Aug 2011 12:47:02 +0200, Vincenzo Maggio wrote:
>> >> > Hello,
>> >> > without further info a debug is rather difficult.
>> >> > But let's try to at least start analysis of the problem:
>> >> >
>> >> >>   - Has this something to do that I initially spawn Bob from 
>> the
>> >> >> Alice
>> >> >>  node?
>> >> >
>> >> > Absolutely not: this would hit the very foundation of Erlang,
>> >> process
>> >> > referential transparency. When a process is started is a brand
>> >> new,
>> >> > clean entity (indeed, default process heap space is always the
>> >> same
>> >> > size!).
>> >> >
>> >> >>   - How can I make Bob to survive a connection loss?
>> >> >
>> >> > Look above: it SHOULD survive.
>> >> >
>> >> > Can you please start SASL (application:start(sasl) from the 
>> shell)
>> >> > and see if shell log puts some further information?
>> >> >
>> >> > Vincenzo
>> >>
>>