[erlang-questions] Questions about supervision

Tue May 3 12:51:47 CEST 2016

Hi Oliver,

On Tue, May 3, 2016 at 12:54 AM, Oliver Korpilla <Oliver.Korpilla@REDACTED> wrote:
> Hello.
>
> I ran into some unexpected results with supervision and transient workers and this gives me a few questions about process shutdown and resource cleanup.
>
> I have a structure like this:
>
> * main supervisor with several permanent children, one of them the (permanent) supervisor for transient children
> * the supervisor for transient children creates processes grouped as follows:
>   - a "group" supervisor
>   - a worker maintaining a TCP connection (transient) linked with the group supervisor
>   - a worker running all the required procedures that determine one client linked with the group supervisor
>   - all these processes are registered under global under the same ID like this: {sup, <id>}, {conn, <id>}, {proc, <id>}
>
> I read that transient children are only restarted on abnormal exit which is exactly what I want.
>
> When trying to figure out how to best clean up the group of listed last, I ran into some snags, though...
>
> I sometimes rely on the supervisor:which_children/1 call to find the IDs of all active groups. When inspecting all three layers of supervision I realized that processes are not cleaned up from the list of children like I expected.

Inspecting supervisor state is not well supported - and it's not a
great idea. You're better off using something like gproc [1] to
register processes by types-of-interest and use it to enumerate them.

> Q1) So - lets say I have a transient child that exits with :normal, will its supervisor clean it out of its child lists eventually?

You're looking for the behavior of the 'simple_one_for_one'
supervisor, which removes children.

> Furthermore, when reading through the documentation of gen_server and supervisor I came to this understanding:
>
> - If shutdown is brutal_kill, no cleanup in the children can take place.
> - If shutdown is infinity or <integer>, terminate in OTP worker children is called only if the child is trapping exits and the exit is not normal (like shutdown).
>
> Q2) Is this understanding correct?

Yes

> Q3) What is the pattern for properly and safely shutting down transient children? supervisor:terminate_child followed by supervisor:delete_child?

You can do this, but it's not a general practice. Ideally you start
your processes (either indirectly at init in your supervisor tree, or
by explicitly starting via a supervisor at various points at runtime)
and then completely forget about them. They terminate in response to
some message or by crashing. If you want to explicitly stop a process,
tell it to stop (i.e. implement the stop behavior in the process
module). Don't try to control it via the supervisor.

> Q4) What is the pattern for keeping supervisors inner state clean so that they don't just accumulate transient children forever?

simple_one_for_one_ftw!

[1] https://github.com/uwiger/gproc