[erlang-questions] supervisor process not responding to messages ('EXIT', which_children, etc)

Scott Lystig Fritchie <>
Wed Apr 28 20:12:25 CEST 2010

gs> The results of 'erlang:process_info(Pid, backtrace)' below as you
gs> suggested.  It seems that the supervisor was trying to restart a
gs> child, the child took too long to start so it was killed, but then
gs> the supervisor hung.  At this point, I can have the child start
gs> faster, but why is the supervisor hung?

The supervisor's behavior must be deterministic, so it starts children
synchronously.  (More on that in a little bit.)

>From the supervisor:start_link() manual:

    The created supervisor process calls Module:init/1 to find out about
    restart strategy, maximum restart frequency and child processes. To
    ensure a synchronized start-up procedure, start_link/2,3 does not
    return until Module:init/1 has returned and all child processes have
    been started.

You have to read between the lines to see that the above paragraph
applies to you.  A child's init func is handled synchronously.  During
the supervisor's start, Module:init/1 won't return until all the
children are started.  All restart strategies require that children be
started in the order that they are specified.

App developers rely on this child start order to preserve
inter-process/service dependencies.  If child processes were started in
random order, application dependencies could be broken, and the app can
run incorrectly or, perhaps worse yet, even fail to start at all.

In your case, when a single worker has died and requires restarting, the
supervisor is using the same method synchronous method of restarting the
child.  If a child can't start in a predictable (and hopefully very
short) amount of time, then the variable-time work needs to be done
after the child's init function returns.  The strategies mentioned a few
days ago in the "Subject: testing asynchronous code" thread can be very


More information about the erlang-questions mailing list