[erlang-questions] Stopping a master process and all its workers

Thu Apr 12 22:38:03 CEST 2018

I omitted a detail: all of the processes are proxies for external resources
that they manage, ie, they simply have to start and stop them and respond
to monitoring events from the resources.
So no real work is actually being done in the processes.
This simplifies things and I should have added that in the first place.

I have a supervisor above the top_sup and that is indeed the one that will
kill top_sup - I framed the question to get a focus on what happens from
top_sup and down.

Given that my "worker" monitor external resources they are all transient -
if my program crashes the external resources may be around after I restart,
so I am currently building persistence to handle this.

All of this will be tested quite heavily. The correspondence to the
external resources will be funny to deal with, eg, what if an external
resource has died while my program was doing a reset? Fun times ahead.

Cheers,
Torben

p.s. sorry about the top reply, but Gmail's Inbox has removed that feature
or I'm too stupid to figure it out.

On Thu, Apr 12, 2018 at 5:20 PM Jesper Louis Andersen <
jesper.louis.andersen@REDACTED> wrote:

> On Thu, Apr 12, 2018 at 4:47 PM Torben Hoffmann <torben.lehoff@REDACTED>
> wrote:
>
>> Are there any subtleties that I need to cater for? Have I given enough
>> information for this question to make sense?
>>
>>
> Yes:
>
> * What is the API accessing this tree? If we start stopping the tree, how
> are those API calls going to behave while the tree is being closed down?
>
> * Many such trees needs some kind of "connection draining phase" where
> they finish their current work, but doesn't start up new work while they
> are being drained.
>
> * If you dynamically start/stop workers, then you might be able to set the
> number of workers to the special case of 0 and then stop the tree.
>
> * Surely, there is a supervisor on top of `top_sup` and it it the one who
> needs to terminate its child. Consider that some supervisor in your
> application has to be "permanent/persistent" over the lifetime of the
> application, so you always have a point to which you can "hang" your
> workers. This allows you to use supervisor:terminate_child/2, but do note
> its documentation about restarting: your child is likely to be temporary,
> which means you need to have some kind of management for this if restarts
> happen in the system.
>
> * Dynamic alteration of the state should be logged: "worker state was
> changed from 8 workers to 0", but it shouldn't report such an event as an
> ERROR in the syslog sense. This is INFO/NOTICE level.
>
> Final important comment:
>
> Do extensive tests of the failure scenario! Graceful recovery is nice, but
> if you don't test it somewhat, you are essentially sacrificing a goat on
> the altar of the god of your choice and you pray to said god that things
> end up being nice for you.
>
>
> --
https://www.linkedin.com/in/torbenhoffmann/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180412/1fa5928e/attachment.htm>