[erlang-questions] supervisors & slow init's

Fri Dec 6 12:07:43 CET 2013

Hi, and thanks to everyone for the great responses so far,

On Thu, Dec 5, 2013 at 6:57 PM, Fred Hebert <mononcqc@REDACTED> wrote:

> On 12/05, Sean Cribbs wrote:
>> I'd echo Jesper's comments in saying that is most important to make sure
>> the supervisor tree starts up quickly. There are several options I see:
>>
>
> Outside of the current's case context, I don't necessarily agree with
> that. The supervision tree can take as long as necessary to start as
> long as it's in a stable state. The requirement for speed is
> application-specific. If you need to do data syncing for 10 minutes
> before starting to boot, I prefer to lock up the supervision tree than
> having to implement 12 child applications than need to synchronize on
> things -- if I can afford it, of course.

So this would be a reason to put a blocking receive in the init and
just wait for it...  I like this because it is simple and reflects how
things work.  A must start correctly for B and then C to start, and if
A crashes, one_for_all will bring down B and C too, and try and
restart A and then everything else.  So far so good!

> To me the most important thing is figuring out what you can or can't do,
> and picking the boot and supervision strategy the most adequate to that.
> If booting fast is counter-intuitive to the results you want, don't do
> it.

The system (based on some hardware that takes its time) does not come
up fast, but is basically useless without the hardware.

That said, some part of the system, that interacts with the end user,
does need to come up fast, which is the next question about the .

Clearly the UI should start before the hardware and be responsive in
telling any users that "the system is not ready yet", and maybe for
exceptional situations, have some diagnostics and tools to poke at a
mostly dead system.  And even if the hardware blows up, the UI should
stick around and serve up some diagnostics.

>> I think the moral of the story is that starting up your system and
>> implementing a protocol between processes should not be conflated. If
>> there's a sequence of steps to be done with potential exit points or
>> branches at each step, FSMs plus messages feels the most sane to me.
>>
>
> Agreed.
>
> The opposite rule is that if potential exit points are unforseeable
> and should (according to spec) not happen (say not being able to open a
> UDP port to localhost, for example), then you may want to skip the
> protocol design step entirely (all code is a risk of bugs!). This means
> you *may* suffer unexpected failures, in which case your choice will be
> to take the necessary means to make the preconditions to your system's
> functionality be respected, or relax them and go with the protocols as
> you start needing them while you grow and groom your system.

Yes - for the time being, if this stuff blows up, I don't want to send
messages around, or make most other bits of the system aware that bits
of the system might be compromised, I want to restart it to a known
stable state or register it as FUBAR, and maintain a small system to
let the user know this, and maybe attempt some diagnostics.

Thank you,
-- 
David N. Welton

http://www.welton.it/davidw/

http://www.dedasys.com/