<div dir="ltr"><div>I omitted a detail: all of the processes are proxies for external
resources that they manage, ie, they simply have to start and stop them
and respond to monitoring events from the resources. <br></div><div>So no real work is actually being done in the processes.<br></div><div>This simplifies things and I should have added that in the first place.<br><br></div><div>I have a supervisor above the top_sup and that is indeed the one that will kill top_sup - I framed the question to get a focus on what happens from top_sup and down.<br><br></div><div>Given that my "worker" monitor external resources they are all transient - if my program crashes the external resources may be around after I restart, so I am currently building persistence to handle this.<br><br></div><div>All of this will be tested quite heavily. The correspondence to the external resources will be funny to deal with, eg, what if an external resource has died while my program was doing a reset? Fun times ahead.<br><br></div><div>Cheers,<br></div><div>Torben<br><br></div><div>p.s. sorry about the top reply, but Gmail's Inbox has removed that feature or I'm too stupid to figure it out.<br></div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, Apr 12, 2018 at 5:20 PM Jesper Louis Andersen <<a href="mailto:jesper.louis.andersen@gmail.com">jesper.louis.andersen@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">On Thu, Apr 12, 2018 at 4:47 PM Torben Hoffmann <<a href="mailto:torben.lehoff@gmail.com" target="_blank">torben.lehoff@gmail.com</a>> wrote:<br></div><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><font face="monospace"><font face="sans-serif">Are there any subtleties that I need to cater for? Have I given enough information for this question to make sense?<br></font></font><div><font face="monospace"><font face="sans-serif"><br></font></font></div></div></blockquote><div><br></div></div></div><div dir="ltr"><div class="gmail_quote"><div>Yes:<br><br></div><div>* What is the API accessing this tree? If we start stopping the tree, how are those API calls going to behave while the tree is being closed down?<br><br></div><div>* Many such trees needs some kind of "connection draining phase" where they finish their current work, but doesn't start up new work while they are being drained.<br><br></div><div>* If you dynamically start/stop workers, then you might be able to set the number of workers to the special case of 0 and then stop the tree.<br><br></div><div>* Surely, there is a supervisor on top of `top_sup` and it it the one who needs to terminate its child. Consider that some supervisor in your application has to be "permanent/persistent" over the lifetime of the application, so you always have a point to which you can "hang" your workers. This allows you to use supervisor:terminate_child/2, but do note its documentation about restarting: your child is likely to be temporary, which means you need to have some kind of management for this if restarts happen in the system.<br><br></div><div>* Dynamic alteration of the state should be logged: "worker state was changed from 8 workers to 0", but it shouldn't report such an event as an ERROR in the syslog sense. This is INFO/NOTICE level.<br><br></div><div>Final important comment:<br><br></div><div>Do extensive tests of the failure scenario! Graceful recovery is nice, but if you don't test it somewhat, you are essentially sacrificing a goat on the altar of the god of your choice and you pray to said god that things end up being nice for you.</div><div><br><br></div></div></div>
</blockquote></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><a href="https://www.linkedin.com/in/torbenhoffmann/">https://www.linkedin.com/in/torbenhoffmann/</a><br></div></div>