[erlang-questions] Why have a supervisor behaviour?
Fred Hebert
mononcqc@REDACTED
Thu May 21 22:32:40 CEST 2015
On 05/21, Roger Lipscombe wrote:
>I need delayed _restart_. Is this what Jesper refers to when he talks
>about "a delay_manager"? Such that init queries that and then
>might/might not delay?
That's a classic question, and one I started answering differently.
Requiring a timeout in your supervisor rebooting function means that you
are letting things crash or restart for the wrong reason.
The thing is, it's all about the guarantees[1]. In a nutshell, a
supervisor should exit on any error, and ideally bring you back to a
known, stable state.
So of course all expected or unexpected errors should be able to bring
you back to that state properly, specifically transient errors.
But the distinction is that because supervisors boot synchronously for
all apps, they also represent a chain of dependencies of what should be
available to all processes started *after* them.
That's why failure modes such as 'one for all' or 'rest for one' exist.
They allow you to specify that the processes there are related to each
other in ways that their death violates some guarantee of invariant in
the system and that the only good way to restart is by refreshing all of
them.
In a nutshell, if you expect disconnections or event that require a
backoff to happen frequently enough they are to be expected by the
processes depending on yours, then that connection or that event is not
a thing that should take place in your process' init function. Otherwise
you're indirectly stating that without this thing working, the system
should just not boot.
See the example in [2] for an idea of how to respect this. This does not
change the code in any major way, but moves function calls around to
properly respect these semantics.
My position is that this isn't a problem with supervisors' interface,
but in how they are being use and what they mean for your system. I know
this is not the most helpful response, but oh well.
[1]: http://ferd.ca/it-s-about-the-guarantees.html
[2]: http://www.erlang-in-anger.com, section 2.2.3
More information about the erlang-questions
mailing list