[erlang-questions] About behavior of OTP's supervisor-worker architecture
Scott Lystig Fritchie
Tue Sep 14 18:45:43 CEST 2010
Hi Torben & Tushar & fellow hackers.
Torben Hoffmann <torben.lehoff@REDACTED> wrote:
th> The OTP supervisor module does not support your desired behaviour,
th> but you can use the supervisor and the monitor facility of Erlang to
th> implement it.
Torben's description is certainly one way to do it. It looks simpler
than what I was going to describe ... and it'll probably work, depending
on the nature of the application.
You would probably put Torben's gen_server-based monitoring proc
elsewhere in the supervisor hierarchy, e.g.
| | | |
th_monitor_proc proc1 proc2 proc3
... where 'th_monitor_proc' is the monitoring proc that Torben
The first problem is that 'th_monitor_proc' will start before
'dynamic_sup' starts, because supervisors start their children from
left-to-right, and a static supervisor won't finish its init until all
of its children have initialized. One (usually unstated) reason
supervisors are so useful is that they start everything in a
deterministic, well-ordered manner ... that's very important when
controlling hardware (which can be very fussy about the order of
operations) or software with strict must-run-before dependencies.
So, it would be better to flip the tree like this:
Then we don't have who-started-first problems at startup. Then the
'th_monitor_proc' can start the dynamic children:
| | | |
proc1 proc2 proc3 th_monitor_proc
The 'top_sup' supervisor should use a one_for_all restart strategy.
After all, the 'dynamic_sup' might fail ... unlikely ... but one of my
favorite testing gimmicks is to run the 'appmon' application, get the
tree view of supervisor & worker processes, then start clicking 'Kill'
on random processes in the tree. :-) Try it. Watching the restarts is
entertaining *and* instructive. And it helps demonstrate how the
various restart strategies do.
You probably do not want to do this:
| | |
proc1 proc2 proc3
... especially if 'another_sup' has a different restart strategy. If
it's possible for 'dynamic_sup' to be killed while 'th_monitor_proc' is
alive, then the races where 'th_monitor_proc' tries to restart a child
but fails becaues 'dynamic_sup' isn't alive ... that situation is best
P.S. An astute reader might have a question about the tree below. The
question is, "What if the 'top_sup' wants to shut down. Won't
'dynamic_sup' be killed first, and won't I still have the same problem
of 'th_monitor_proc' is alive but 'dynamic_sup' is dead?"
The answer is "No". The Design Principles guide says:
Since the supervisor is part of a supervision tree, it will
automatically be terminated by its supervisor. When asked to shutdown,
it will terminate all child processes in reversed start order
according to the respective shutdown specifications, and then
P.P.S. Another solution is to write a module R that calls module P's
callbacks the first time that it's run and calls module Q's callbacks
when restarted. The "problem" is then shifted to making R be able to
figure out if a child is running the first time or it's been restarted.
You could write a file in the file system, or keep something in Mnesia
or a system-wide ETS table, or other stateful thing.
Managing such state is usually costs more than its value, but it can be
useful in some situations, especially if you already have to manage
state like that for other parts of the app.
More information about the erlang-questions