Author: Maria Scott <maria-12648430(at)hnc-agency(dot)org>,
        Jan Uhlig <juhlig(at)hnc-agency(dot)org>
Status: Draft
Type: Standards Track
Created: 04-Mar-2021
Erlang-Version: 24.0
Post-History: 08-Mar-2021, 17-Mar-2021, 23-Mar-2021, 31-Mar-2021
Replaces:

EEP 56: Automatic supervisor shutdown triggered by termination of significant children

Abstract

This EEP introduces a way of automatically terminating supervisors based on the termination of specifically marked significant children.

This document is based on the discussion in OTP-PR 4521.

Motivation

Children under a supervisor often represent a work unit, that means, a group of cooperating processes, as opposed to just a single process. Such work unit supervisors (called group supervisors in the context of this document) are themselves typically hosted by a simple_one_for_one supervisor, via which they are started as needed.

At the time of this writing, however, there is no good, canonical way of stopping such group supervisors once the work unit they represent has finished it's work and the respective child processes have terminated, meaning the group supervisors will hang around, idle forever unless stopped manually one way or another.

This has been addressed in applications in a variety of ways, none of which can be called truly good, straightforward, or canonical:

Both of the above approaches suffer from the fact that the children responsible for the shutdown have to know things about their surroundings, namely...:

This may be tackled by having a dedicated overseer child that watches the other children and acts according to their behavior. However, this requires considerable boilerplate code for tasks that would be better suited in the supervisor. Also, there is the problem that the overseer process must keep the list of children it watches up to date should any of them be restarted, either by enabling the children to register with it on start (for which they in turn must know the overseer process' pid), or asking the supervisor for it.

Another approach that is often used is to make the children responsible for the shutdown of the group supervisor permanent and the supervisor's restart intensity to 0. This has the downside that the child will not be restarted but cause the supervisor to shut down if it exits abnormally but could be restarted. Another downside to this approach is that it produces error messages (crash reports), even if the shutdown is intended.

Last but not least, some people have taken the approach to clone the OTP supervisor and customize it to their needs, for reasons outlined here and others.

Rationale

This EEP provides a means to alleviate the problems outlined in the motivation by introducing a way to mark specific children as significant via a new child spec flag, and a way to configure supervisors to shut down automatically depending on the exit of significant children via a new supervisor flag.

In order to keep backwards compatibility, the new flags will only be usable in the map forms of child specs and supervisor flags, and for the same reason the default values for the new flags are chosen such that, in their absence, the supervisor behaves the same as it does to date.

The new child spec flag is named significant with possible values true and false, with false being the default.

The new supervisor flag is named auto_shutdown with possible values never, any_significant and all_significant, with never being the default.

With the supervisor auto_shutdown flag set to never, the child spec flag significant is not allowed to be true. The never value and the restriction on the significant value is intended as a safety means to defend against unintended automatic shutdowns, for example by the exit of a significant child which was added later via supervisor:start_child/2. As the spec for such a child would not be present in the supervisor:init/1 callback code but somewhere else, debugging such unexplained supervisor shutdowns might be difficult.

Otherwise, the following rules apply when a significant child exits on its own:

If the restart type is permanent, the significant flag is not allowed to be true, as this combination does not make sense.

To be clear, the above rules only apply when significant children exit by themselves, that is, not when being terminated manually via supervisor:terminate_child/2, not when other non-significant children exit, and not when being terminated as a consequence of a sibling's death in the one_for_all or rest_for_one strategies.

The approach proposed here could also be used to the effect of "shutdown when empty" by marking all children as significant and setting the supervisor auto_shutdown flag to all_significant.

It is worth mentioning that the simple_one_for_one strategy poses a special case, as it can have only a single child spec that applies to all children. That means that either all children are significant ones, or none is.

Considerations

Using temporary significant children in one_for_all and rest_for_one supervisors may lead to an edge case scenario in which an intended automatic shutdown will not happen. Temporary children will not be restarted, not even when their termination was caused by a sibling's death. On the other hand, the automic shutdown of a supervisor is not triggered when a significant child is terminated as a consequence of a sibling's death. Thus, a temporary significant child intended to automatically shut down it's supervisor will be lost if it is terminated as a consequence of a sibling's death.

Backwards Compatibility

The changes proposed in this document introduce no incompatible changes, as the new child spec and supervisor flags are optional and default to values that result in the current behavior. Also, all the current workarounds outlined in the Motivation will still work.

Although the proposed changes are backwards compatible, applications using this enhancement may not be compatible when compiled with previous OTP versions unless proper care is taken. Such an application compiled with older OTP versions will leak processes, as the automatic supervisor shutdowns it relies on to remove unused parts of it's supervision tree will not happen. Taking care of this issue is at the discretion of implementors if they expect an application which uses the significant child behavior to be compiled with an OTP version that predates it's appearance.

Implementation

A reference implementation which will be updated to reflect the state of this document can be found in OTP-PR 4638.

Copyright

This document has been placed in the public domain.