[erlang-questions] Accessing sibling processes in a supervisor.

Karolis Petrauskas k.petrauskas@REDACTED
Fri Dec 21 11:00:00 CET 2012

Thank you, Fred, for a great book!

I changed your example (ppool-1.0/src/ppool_serv.erl) a bit to
illustrate my concern. I just added a function die/1, that does
nothing apart from stopping gen_server with reason != normal. Bellow
is a diff showing my changes:

learn-you-some-erlang$ git diff
diff --git a/ppool-1.0/src/ppool_serv.erl b/ppool-1.0/src/ppool_serv.erl
index bf901dc..ff11dfb 100644
--- a/ppool-1.0/src/ppool_serv.erl
+++ b/ppool-1.0/src/ppool_serv.erl
@@ -1,6 +1,6 @@
--export([start/4, start_link/4, run/2, sync_queue/2, async_queue/2, stop/1]).
+-export([start/4, start_link/4, run/2, sync_queue/2, async_queue/2,
stop/1, die/1]).
 -export([init/1, handle_call/3, handle_cast/2, handle_info/2,
          code_change/3, terminate/2]).

@@ -36,6 +36,9 @@ async_queue(Name, Args) ->
 stop(Name) ->
     gen_server:call(Name, stop).

+die(Name) ->
+    gen_server:call(Name, die).
 %% Gen server
 init({Limit, MFA, Sup}) ->
     %% We need to find the Pid of the worker supervisor from here,
@@ -59,6 +62,8 @@ handle_call({sync, Args},  From, S = #state{queue=Q}) ->

 handle_call(stop, _From, State) ->
     {stop, normal, ok, State};
+handle_call(die, _From, State) ->
+    {stop, error, dying, State};
 handle_call(_Msg, _From, State) ->
     {noreply, State}.

I have compiled it and then called the following functions:

    ppool:start_pool(nagger, 2, {ppool_nagger, start_link, []}).

and got the following errors:

=ERROR REPORT==== 21-Dec-2012::11:39:44 ===
** Generic server nagger terminating
** Last message in was die
** When Server state == {state,2,<0.43.0>,{0,nil},{[],[]}}
** Reason for termination ==
** error

=ERROR REPORT==== 21-Dec-2012::11:39:44 ===
** Generic server nagger terminating
** Last message in was {start_worker_supervisor,<0.41.0>,
** When Server state == {state,2,undefined,{0,nil},{[],[]}}
** Reason for termination ==
** {{badmatch,{error,{already_started,<0.46.0>}}},

The second error is the one I was talking about. On the other hand,
the entire application has not crashed, but the ppool_sup was
terminated due to reached_max_restart_intensity and then restarted.
Was that the intended behaviour?

Best regards,
Karolis Petrauskas

On Fri, Dec 21, 2012 at 4:19 AM, Fred Hebert <mononcqc@REDACTED> wrote:
> Hi, Author of LYSE here.
> The strategy I mention will work because the direct supervisor of both
> processes uses a 'one_for_all' restart strategy, meaning that if the
> gen_server crashes, or the supervisor it relies on crashes, both are
> killed and then restarted. The names are unregistered automatically upon
> their death, and things should back up error free.
> This is a decision made because the gen_server strongly relies on the
> supervisor to handle its children, and if it crashes, there's no easy
> way for a new server to pick up from where the other left regarding
> messages, references, tasks to do, etc. If the supervisor dies, then it
> means the children crashed a lot and you had some kind of problem there
> anyway.
> In both cases, it is *a lot* simpler (at least, I think it is) to crash
> and restart everything from a fresh state rather than have a new server
> register under a new name and try to figure out what the hell was going
> on before it came into existence.
> I think a lot of people want to limit what crashes in their systems in a
> way to eliminate errors as much as possible, but in this case I believe
> crashing more stuff makes the case much simpler, and more likely to
> avoid weird heisenbugs that take weeks to fix down the line when you
> wonder why you seem to be missing data or have rogue workers hanging
> around when they shouldn't be. There's a very direct dependency between
> the two processes, and they don't necessarily make sense without the
> other being there. They spawned together, and they should die together.
> Given this design decision, it becomes somewhat useless to register the
> names for the sake of it, and just passing the pid directly is entirely
> fine.
> Regards,
> Fred.
> On 12/21, Karolis Petrauskas wrote:
>> Hi,
>> I have a question regarding an example [1] in LYSE. The example
>> proposes a supervision scheme for a server-worker like processes. I
>> have used this scheme a lot (I learnt Erlang from this book mainly),
>> but now I'm in doubt. Is the proposed way of accessing a sibling
>> process (access worker_sup from ppool_serv, see [1]) is the correct
>> one? How will the ppool_serv get a PID of the worker_sup after a crash
>> and restart? As I understand, if one of the processes will crash, both
>> processes will be restarted and the server should get an error while
>> starting the worker_sup again (the corresponding child already
>> exists):
>>     handle_info({start_worker_supervisor, Sup, MFA}, S = #state{}) ->
>>         {ok, Pid} = supervisor:start_child(Sup, ?SPEC(MFA)),        %
>> Will this work after restart?
>> Or maybe I missed the point? I am aware of some other ways of getting
>> processes to know each other [2], but I would like to get your
>> comments on this example. Other schemes for implementing communication
>> of anonymous sibling processes (in the supervision tree) would be
>> interesting also.
>> [1] http://learnyousomeerlang.com/building-applications-with-otp#implementing-the-supervisors
>> [2] http://erlang.2086793.n4.nabble.com/supervisor-children-s-pid-td3530959.html#a3531973
>> Best regards,
>> Karolis Petrauskas
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions

More information about the erlang-questions mailing list