[erlang-questions] Supervision trees and child startup ordering

Wed Apr 30 21:12:14 CEST 2014

Answers inline

On 04/30, Youngkin, Rich wrote:
> Hi,
> 
> I'm using a supervision tree to manage multiple sets of gen_servers.  I've
> got a situation where gen_servers in one supervision tree (e.g., Tree-B)
> are clients of gen_servers in another supervision tree (e.g., Tree-A).  I'd
> like Tree-A and all of it's children to complete their initialization
> processing before starting any of the children in Tree-B.
> 
> [...]
> 
> So given this diagram, A and its children 1,2 and 3 start and complete
> initialization before B and children 4 and 5 are started and complete
> initialization. Furthermore, initialization in this case means that in
> module:init/1 the child casts itself a message to complete initialization.

I recommand in this case that Root Supervisor adopts a 'rest_for_one'
strategy that will ensure that if Supervisor-A fails at any point during
run-time, Supervisor-B and its children are killed before the whole
structure is restarted. Otherwise, you may run into the issue of having
Supervisor-B's children run without A being present.

> During Child 4's initialization it calls a function on Child 1 (which has
> an init/1 implementation as described above).
> 
> In reading http://www.erlang.org/doc/man/supervisor.html#start_link-2 I
> understand that Supervisor-A and all it's children will be started and
> available before Supervisor-B and its children are started.  It's also my
> understanding that when a child casts itself a message in its init/1
> function that that message is guaranteed to be the first message in its
> mailbox. Is my understanding of this correct and if so, is this a
> reasonable way to do what I'm trying to accomplish?
> 

Two things.

1. The message is guaranteed to be the first one in the mailbox *iff*
   the process hasn't been registered before, or given its pid to anyone
   who could send it a message. Usually, that's fine when using OTP.
2. The init scheme you're going for will turn your boot sequence into an
   asynchronous one -- it's possible children of B will be spawned while
   children of A are still doing the handling of the initial message in
   their handle function.

The latter may be a valid choice if the resource started asynchronously
isn't always guaranteed to be available. Meaning that in this case,
you're guaranteeing that the client will be up, but not the connection.

This is actually a very sane model for external resources, but you just
have to be aware you're implementing it :)

See http://ferd.ca/it-s-about-the-guarantees.html for more details on
this.

Regards,
Fred.