[erlang-questions] Why have a supervisor behaviour?

Fred Hebert mononcqc@REDACTED
Thu May 21 22:32:40 CEST 2015


On 05/21, Roger Lipscombe wrote:
>I need delayed _restart_. Is this what Jesper refers to when he talks
>about "a delay_manager"? Such that init queries that and then
>might/might not delay?

That's a classic question, and one I started answering differently.  
Requiring a timeout in your supervisor rebooting function means that you 
are letting things crash or restart for the wrong reason.

The thing is, it's all about the guarantees[1]. In a nutshell, a 
supervisor should exit on any error, and ideally bring you back to a 
known, stable state.

So of course all expected or unexpected errors should be able to bring 
you back to that state properly, specifically transient errors.

But the distinction is that because supervisors boot synchronously for 
all apps, they also represent a chain of dependencies of what should be 
available to all processes started *after* them.

That's why failure modes such as 'one for all' or 'rest for one' exist.  
They allow you to specify that the processes there are related to each 
other in ways that their death violates some guarantee of invariant in 
the system and that the only good way to restart is by refreshing all of 
them.

In a nutshell, if you expect disconnections or event that require a 
backoff to happen frequently enough they are to be expected by the 
processes depending on yours, then that connection or that event is not 
a thing that should take place in your process' init function. Otherwise 
you're indirectly stating that without this thing working, the system 
should just not boot.

See the example in [2] for an idea of how to respect this. This does not 
change the code in any major way, but moves function calls around to 
properly respect these semantics.

My position is that this isn't a problem with supervisors' interface, 
but in how they are being use and what they mean for your system. I know 
this is not the most helpful response, but oh well.


[1]: http://ferd.ca/it-s-about-the-guarantees.html
[2]: http://www.erlang-in-anger.com, section 2.2.3



More information about the erlang-questions mailing list