[erlang-questions] simple_one_for_one supervisor - what happens at restart? (also: gen_tcp)
Tue May 17 01:49:18 CEST 2016
On 05/16, Chandru wrote:
>No, it's not. The reason a terminate callback is provided in a gen_server
>is so that a process can clean up when it terminates, not to delegate it to
I'm gonna side with Loïc here. The terminate callback is good for any
process-local cleanup or optimistic work, but is by no means a safe way
to terminate anything.
For example, if you have many children to terminate and through some
interleaving brutall_kill is triggered (or anyone calls exit(Pid,
kill)), whatever work you wanted to do in terminate will be skipped by a
non-trappable exit signal.
Using terminate as your sole termination clean up is risky. It is better
to assume that it will not be called every time, only in controlled
terminations and some accidental ones. This is especially true of
non-collected resources -- not ports nor ETS tables -- specifically live
dependencies such as other processes mid-discussion.
The other side has to be able to cope with the termination of its peer;
this can be done through monitors, sometimes through link+trap_exit. If
recovery is not possible, just dying is appropriate.
>No, it's not. From the manual:
>The supervisor is responsible for starting, stopping and monitoring its
>child processes. The basic idea of a supervisor is that it shall keep its
>child processes alive by restarting them when necessary.
In practice, the release handling mechanisms will make use of that
supervision structure to walk the tree: that's why you declare whether a
supervisor's child are workers or supervisors (leaf or inner node!)
The tree is being walked the entire way through.
That being said, I personally try to avoid calling the supervisor to
know who its children are and prefer named nodes. For me the supervisor
is first and foremost a definition of a unit of failure, of dependencies
between workers or subtrees.
>Look carefully at the example I provided in the gist and Oliver's use case.
>It is perfectly sound advice. If you are ever walking your supervisor
>hierarchy do something with your application, you are doing it wrong.
See release upgrades; if you need to walk your entire system at once,
doing it through supervisors is not a bad idea.
Funnily enough, the supervision structure isn't all that is being
trusted though. When an app is shut down, the application controller (or
is it the master?) also runs through all of the processes on the node
and looks for those for whose it is the group leader and then force
kills them -- preventing the terminate function from being called.
More information about the erlang-questions