[erlang-questions] Process surviving disconnect

Robert Virding robert.virding@REDACTED
Thu Aug 18 23:35:17 CEST 2011


Not quite.

By default when no explicit io device is given a process does io, including io:format, to/from its group_leader. When a process is spawned it will inherit its group_leader from the spawning process, this irrespective on which node the group_leader and process are. So when you spawn a process from a shell it will use that shell's group_leader which in this case was on the same node as the shell. Giving the io device 'user' to io:format it will use a default io server on the same node.

That it was started from a shell is irrelevant here.

Robert

----- Original Message -----
> By default a process writes to the shell which started it.
> 
>  Everything works as intended and expected if I change bob.erl
> 
>  DELETE:  io:format ("Alice died of ~p.~n", [Reason] ),
>  REPLACE: io:format (user, "Alice died of ~p.~n", [Reason] ),
> 
>  This way, Bob writes to his own shell and all is well.
> 
>  On Thu, 18 Aug 2011 15:19:05 +0200, Vincenzo Maggio wrote:
> > Mmm, sorry but I don't grasp it: shouldn't bob write 'Alice died of
> > no connection' on bob's node, which is in fact well alive?
> >
> >> Date: Wed, 17 Aug 2011 21:25:37 -0600
> >> From: erlang@REDACTED
> >> To: maggio.vincenzo@REDACTED
> >> CC: erlang-questions@REDACTED
> >> Subject: RE: [erlang-questions] Process surviving disconnect
> >>
> >>  Mystery solved: no erlang bug but a brain bug of mine.
> >>
> >>  When I cut the network connection, Bob tries to write "Alice died
> >> of
> >>  noconnection" to Alice's console which obviously is unreachable.
> >>  So
> >> loop
> >>  crashEs and Bob dies.
> >>
> >>  On Wed, 17 Aug 2011 22:56:20 +0200, Vincenzo Maggio wrote:
> >> > Absolutely, maybe it's us but I notice that you, as me, has
> >> > tried
> >> all
> >> > the possibilities and in fact in any case but on link
> >> disconnection
> >> > bob (and we think is right, BTW!!!) remains alive and in good
> >> health.
> >> > I hope someone shows up with a clever explanation, if not it's
> >> > perfectly understable to file a bug.
> >> >
> >> >>  I disconnect Alice from the network -> Bob joins his
> >> >>  ancestors.
> >> > Ahahah, may it rest in peace with the ones he loved!!!
> >> >
> >> >
> >> >> Date: Wed, 17 Aug 2011 14:49:29 -0600
> >> >> From: erlang@REDACTED
> >> >> To: maggio.vincenzo@REDACTED
> >> >> CC: erlang-questions@REDACTED
> >> >> Subject: RE: [erlang-questions] Process surviving disconnect
> >> >>
> >> >>  OK, I changed the spawn_link to spawn and subsequent monitor
> >> >>  as
> >> you
> >> >>  suggest: exactely the same behaviour.
> >> >>
> >> >>  Alice terminates gracefully -> Bob gets informed and is alive.
> >> >>  Alice terminates ungracefully -> Bob gets informed and is
> >> >>  alive.
> >> >>  I kill Alice's node -> Bob gets informed and is alive.
> >> >>  I disconnect Alice from the network -> Bob joins his
> >> >>  ancestors.
> >> >>
> >> >>  So I will wait some days more here on erlang-questions and if
> >> >> nothing
> >> >>  shows up here, I will file a bug and hopefully will be proven
> >> >> wrong.
> >> >>
> >> >>  On Wed, 17 Aug 2011 22:23:06 +0200, Vincenzo Maggio wrote:
> >> >> > Use monitor(process, PID_TO_BE_MONITORED) after spawn.
> >> >> > BTW not having two PCs I simply tested your code killing
> >> >> > alice
> >> and
> >> >> > guess what? bob survives! :)
> >> >> >
> >> >> >> Date: Wed, 17 Aug 2011 14:13:37 -0600
> >> >> >> From: erlang@REDACTED
> >> >> >> To: maggio.vincenzo@REDACTED
> >> >> >> CC: erlang-questions@REDACTED
> >> >> >> Subject: RE: [erlang-questions] Process surviving disconnect
> >> >> >>
> >> >> >>  Thank you for your quick answer.
> >> >> >>
> >> >> >> > BTW, before filing a bug, could you please substitute
> >> >> spawn_link
> >> >> >> with
> >> >> >> > spawn_monitor and remove the process_flag lines? It would
> >> >> >> > be
> >> >> >> > interesting to understand if either bob dies on its own or
> >> it's
> >> >> >> > killed
> >> >> >> > by no more being able to communicate with alice.
> >> >> >>
> >> >> >>  I am not sure how to replace spawn_link with spawn_monitor,
> >> as
> >> >> >> neither
> >> >> >>  spawn_monitor/1 nor spawn_monitor/3 take a node parameter.
> >> >> >>  How do I do that or how else can I get some more detailed
> >> >> >> information
> >> >> >>  about Bob's sudden passing the Styx.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>  On Wed, 17 Aug 2011 21:48:30 +0200, Vincenzo Maggio wrote:
> >> >> >> > Hello,
> >> >> >> > I think something is wrong here.
> >> >> >> >>  Bob die of noconnection.
> >> >> >> >
> >> >> >> > This is printed by
> >> >> >> >>  		{'EXIT', Bob, Reason} ->
> >> >> >> >>  			io:format ("Bob died of ~p.~n", [Reason] ),
> >> >> >> >
> >> >> >> > So alice is in fact receiving bob last death cry :D and
> >> >> >> process_flag
> >> >> >> > translate it in a message instead of transmitting exit
> >> signal
> >> >> to
> >> >> >> > alice; I think this is ok from the point of view of Alice,
> >> so
> >> >> the
> >> >> >> > real
> >> >> >> > problem is that bob is dying (I know it's mundane, but I
> >> >> learned
> >> >> >> not
> >> >> >> > to make assumption).
> >> >> >> > Mmm, well I don't know if having no more a connection
> >> between
> >> >> the
> >> >> >> > process makes Erlang VM do some assumption of a virtual
> >> master
> >> >> >> node.
> >> >> >> > Well, if you want my opinion, I think that you should file
> >> >> >> > a
> >> >> bug
> >> >> >> on
> >> >> >> > the Erlang bugs mailing list if no one comes up with a
> >> proper
> >> >> >> > explanation.
> >> >> >> > Even if what we're thinking is wrong and this is not a
> >> >> >> > bug,
> >> in
> >> >> the
> >> >> >> > past I had a problem on node lookup and they resolved it.
> >> >> >> > These are my two cents, but if you can please let me know
> >> >> >> > if
> >> >> there
> >> >> >> > are further updates 'cause it's a really interesting
> >> problem.
> >> >> >> >
> >> >> >> > BTW, before filing a bug, could you please substitute
> >> >> spawn_link
> >> >> >> with
> >> >> >> > spawn_monitor and remove the process_flag lines? It would
> >> >> >> > be
> >> >> >> > interesting to understand if either bob dies on its own or
> >> it's
> >> >> >> > killed
> >> >> >> > by no more being able to communicate with alice.
> >> >> >> >
> >> >> >> > Vincenzo
> >> >> >> >
> >> >> >> >> Date: Wed, 17 Aug 2011 10:57:35 -0600
> >> >> >> >> From: erlang@REDACTED
> >> >> >> >> To: maggio.vincenzo@REDACTED
> >> >> >> >> CC: erlang-questions@REDACTED
> >> >> >> >> Subject: RE: [erlang-questions] Process surviving
> >> disconnect
> >> >> >> >>
> >> >> >> >>  Thank you very much Vincenzo. You affirmed my assertion
> >> that
> >> >> Bob
> >> >> >> >> should
> >> >> >> >>  survive the disconnect.
> >> >> >> >>  Nevertheless he dies.
> >> >> >> >>  I will point out exactly what I do and maybe someone can
> >> spot
> >> >> >> the
> >> >> >> >> error
> >> >> >> >>  in my code, my setup or my thinking and tell me what I
> >> >> >> >>  am
> >> >> doing
> >> >> >> >> wrong.
> >> >> >> >>
> >> >> >> >>  I start a node on gca.local:
> >> >> >> >>  unroot@REDACTED:~$ erl -name 'bob@REDACTED' -setcookie 123
> >> >> >> >>
> >> >> >> >>  I start a node on usa.local:
> >> >> >> >>  unroot@REDACTED:~$ erl -name 'alice@REDACTED' -setcookie 123
> >> >> >> >>
> >> >> >> >>  I start sasl on bob@REDACTED:
> >> >> >> >>  (bob@REDACTED)1> application:start (sasl).
> >> >> >> >>
> >> >> >> >>  I run alice:start/0 on alice@REDACTED:
> >> >> >> >>  (alice@REDACTED)1> alice:start ().
> >> >> >> >>  true
> >> >> >> >>
> >> >> >> >>  I look for bob on bob@REDACTED and save its pid:
> >> >> >> >>  (bob@REDACTED)2> whereis (bob).
> >> >> >> >>  <0.65.0>
> >> >> >> >>  (bob@REDACTED)3> Pid = whereis (bob).
> >> >> >> >>  <0.65.0>
> >> >> >> >>
> >> >> >> >>  I cut the network cable and wait a minute for the
> >> >> >> >>  timeout.
> >> >> >> >>
> >> >> >> >>  On alice@REDACTED I get the following output:
> >> >> >> >>  =ERROR REPORT==== 17-Aug-2011::10:53:21 ===
> >> >> >> >>  ** Node 'bob@REDACTED' not responding **
> >> >> >> >>  ** Removing (timedout) connection **
> >> >> >> >>  Bob die of noconnection.
> >> >> >> >>
> >> >> >> >>  Nice, Alice trapped Bob's death and reported it. I check
> >> for
> >> >> >> Alice:
> >> >> >> >>  (alice@REDACTED)2> whereis (alice).
> >> >> >> >>  <0.42.0>
> >> >> >> >>
> >> >> >> >>  Alice is up and running.
> >> >> >> >>
> >> >> >> >>  On bob@REDACTED I get the following output:
> >> >> >> >>  =ERROR REPORT==== 17-Aug-2011::10:53:10 ===
> >> >> >> >>  ** Node 'alice@REDACTED' not responding **
> >> >> >> >>  ** Removing (timedout) connection **
> >> >> >> >>
> >> >> >> >>  But Bob is dead:
> >> >> >> >>  (bob@REDACTED)4> whereis (bob).
> >> >> >> >>  undefined
> >> >> >> >>  (bob@REDACTED)5> is_process_alive (Pid).
> >> >> >> >>  false
> >> >> >> >>
> >> >> >> >>  I really do not understand what is happening.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>  Thank you in advance
> >> >> >> >>
> >> >> >> >>  Anchise
> >> >> >> >>
> >> >> >> >>  Here goes the code I used:
> >> >> >> >>
> >> >> >> >>  -module (alice).
> >> >> >> >>  -compile (export_all).
> >> >> >> >>
> >> >> >> >>  start () -> register (alice, spawn (fun init/0) ).
> >> >> >> >>
> >> >> >> >>  stop () -> whereis (alice) ! stop.
> >> >> >> >>
> >> >> >> >>  init () ->
> >> >> >> >>  	process_flag (trap_exit, true),
> >> >> >> >>  	Bob = spawn_link ('bob@REDACTED', bob, start, [self ()
> >> >> >> >>  	]
> >> ),
> >> >> >> >>  	loop (Bob).
> >> >> >> >>
> >> >> >> >>  loop (Bob) ->
> >> >> >> >>  	receive
> >> >> >> >>  		stop -> ok;
> >> >> >> >>  		{'EXIT', Bob, Reason} ->
> >> >> >> >>  			io:format ("Bob died of ~p.~n", [Reason] ),
> >> >> >> >>  			loop (Bob);
> >> >> >> >>  		Msg ->
> >> >> >> >>  			io:format ("Alice received ~p.~n", [Msg] ),
> >> >> >> >>  			loop (Bob)
> >> >> >> >>  	end.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>  -module (bob).
> >> >> >> >>  -compile (export_all).
> >> >> >> >>
> >> >> >> >>  start (Alice) ->
> >> >> >> >>  	process_flag (trap_exit, true),
> >> >> >> >>  	register (bob, self () ),
> >> >> >> >>  	loop (Alice).
> >> >> >> >>
> >> >> >> >>  loop (Alice) ->
> >> >> >> >>  	receive
> >> >> >> >>  		stop -> ok;
> >> >> >> >>  		{'EXIT', Alice, Reason} ->
> >> >> >> >>  			io:format ("Alice died of ~p.~n", [Reason] ),
> >> >> >> >>  			loop (Alice);
> >> >> >> >>  		Msg ->
> >> >> >> >>  			io:format ("Bob received ~p.~n", [Msg] ),
> >> >> >> >>  			loop (Alice)
> >> >> >> >>  	end.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>  On Wed, 17 Aug 2011 12:47:02 +0200, Vincenzo Maggio
> >> >> >> >>  wrote:
> >> >> >> >> > Hello,
> >> >> >> >> > without further info a debug is rather difficult.
> >> >> >> >> > But let's try to at least start analysis of the
> >> >> >> >> > problem:
> >> >> >> >> >
> >> >> >> >> >>   - Has this something to do that I initially spawn
> >> >> >> >> >>   Bob
> >> >> from
> >> >> >> the
> >> >> >> >> >> Alice
> >> >> >> >> >>  node?
> >> >> >> >> >
> >> >> >> >> > Absolutely not: this would hit the very foundation of
> >> >> Erlang,
> >> >> >> >> process
> >> >> >> >> > referential transparency. When a process is started is
> >> >> >> >> > a
> >> >> brand
> >> >> >> >> new,
> >> >> >> >> > clean entity (indeed, default process heap space is
> >> always
> >> >> the
> >> >> >> >> same
> >> >> >> >> > size!).
> >> >> >> >> >
> >> >> >> >> >>   - How can I make Bob to survive a connection loss?
> >> >> >> >> >
> >> >> >> >> > Look above: it SHOULD survive.
> >> >> >> >> >
> >> >> >> >> > Can you please start SASL (application:start(sasl) from
> >> the
> >> >> >> shell)
> >> >> >> >> > and see if shell log puts some further information?
> >> >> >> >> >
> >> >> >> >> > Vincenzo
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
> 



More information about the erlang-questions mailing list