[erlang-questions] Accessing sibling processes in a supervisor.

Fri Dec 21 03:19:43 CET 2012

Hi, Author of LYSE here.

The strategy I mention will work because the direct supervisor of both
processes uses a 'one_for_all' restart strategy, meaning that if the
gen_server crashes, or the supervisor it relies on crashes, both are
killed and then restarted. The names are unregistered automatically upon
their death, and things should back up error free.

This is a decision made because the gen_server strongly relies on the
supervisor to handle its children, and if it crashes, there's no easy
way for a new server to pick up from where the other left regarding
messages, references, tasks to do, etc. If the supervisor dies, then it
means the children crashed a lot and you had some kind of problem there
anyway.

In both cases, it is *a lot* simpler (at least, I think it is) to crash
and restart everything from a fresh state rather than have a new server
register under a new name and try to figure out what the hell was going
on before it came into existence.

I think a lot of people want to limit what crashes in their systems in a
way to eliminate errors as much as possible, but in this case I believe
crashing more stuff makes the case much simpler, and more likely to
avoid weird heisenbugs that take weeks to fix down the line when you
wonder why you seem to be missing data or have rogue workers hanging
around when they shouldn't be. There's a very direct dependency between
the two processes, and they don't necessarily make sense without the
other being there. They spawned together, and they should die together.

Given this design decision, it becomes somewhat useless to register the
names for the sake of it, and just passing the pid directly is entirely
fine.

Regards,
Fred.

On 12/21, Karolis Petrauskas wrote:
> Hi,
> 
> I have a question regarding an example [1] in LYSE. The example
> proposes a supervision scheme for a server-worker like processes. I
> have used this scheme a lot (I learnt Erlang from this book mainly),
> but now I'm in doubt. Is the proposed way of accessing a sibling
> process (access worker_sup from ppool_serv, see [1]) is the correct
> one? How will the ppool_serv get a PID of the worker_sup after a crash
> and restart? As I understand, if one of the processes will crash, both
> processes will be restarted and the server should get an error while
> starting the worker_sup again (the corresponding child already
> exists):
> 
>     handle_info({start_worker_supervisor, Sup, MFA}, S = #state{}) ->
>         {ok, Pid} = supervisor:start_child(Sup, ?SPEC(MFA)),        %
> Will this work after restart?
> 
> Or maybe I missed the point? I am aware of some other ways of getting
> processes to know each other [2], but I would like to get your
> comments on this example. Other schemes for implementing communication
> of anonymous sibling processes (in the supervision tree) would be
> interesting also.
> 
> [1] http://learnyousomeerlang.com/building-applications-with-otp#implementing-the-supervisors
> [2] http://erlang.2086793.n4.nabble.com/supervisor-children-s-pid-td3530959.html#a3531973
> 
> Best regards,
> Karolis Petrauskas
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions