A supervisor is a process that supervises child processes. A child can be another supervisor or a worker process. A supervisor is always linked to its children. This structure is used to build a supervision tree, which is a nice way to structure an application for fault tolerance.
The basic idea of a supervisor is that it keeps its children alive. If a child terminates abnormally, it is restarted. There are three basic types of restart strategies for supervisors, one-for-one, one-for-all, and rest-for-one
There is yet another restart strategy which is a variant of the ordinary one-for-one. It is called simple-one-for-one. It should be used for dynamic processes of the same type, for example processes which represent a call. Compared to one-for-one, this type has reduced overheads in starting dynamic children .
Each child can be one of three types: permanent, transient, or temporary. A permanent child is always restarted when it dies. A transient child is restarted if it dies abnormally, and a temporary child is never restarted.
The supervisors have a built-in mechanism to prevent situations
where a child dies, is restarted by the supervisor, only to die
again for the same reason, is restarted again, and so on. It
limits the number of restarts which can occur in a given time
interval. This is determined by the values of two parameters,
MaxR
and MaxT
. If more than MaxR
restarts
are performed in the last MaxT
seconds, then the
supervisor shuts down all the children which it supervises and
then dies.
An instance of the supervisor behaviour can be debugged using
the module sys
.
start_link(Module,StartArgs) -> SupRet
start_link(SupName,Module,StartArgs) -> SupRet
SupName = {local, atom()} | {global, atom()}
Module = atom()
StartArgs = term()
SupRet = {ok, Pid} | ignore | {error, Reason}
Pid = pid()
Reason = {already_started, Pid} | term()
Starts a new instance of the supervisor behaviour. The
function Module:init(StartArgs)
is called in order to
create a start specification (see below).
If the supervisor is started without SupName
, it can
only be called using the returned Pid
identifier. If
it is started with SupName
, the name is registered
locally or globally.
Supervisor = pid() | SupName | {global, SupName}
ChildSpec = child_spec()
ExtraStartArgs = [term()]
child_spec() = {Name, Start, Restart,
Shutdown, Type, Modules}
SupName = atom()
Name = term()
Start = {M, F, A}
Restart = permanent | transient | temporary
Shutdown = int() >= 0 |
brutal_kill | infinity
Type = worker | supervisor
Modules = [atom()] | dynamic
Child = pid() | undefined
Info = term()
Use this function to dynamically add a child to a
supervisor. The start function Start
is supposed to
return {ok, Pid} | {ok, Pid, Info} | ignore | {error,
Reason}
. If
ignore
is returned, the supervisor ignores the child
and returns {ok, undefined}
. The start function is
executed by the supervisor process. It must return a Pid
that is linked to the caller (i.e. the supervisor). The
supervisor uses this link to monitor and control the child.
If {ok, Pid, Info}
is returned from the start
function, the same is returned from this function. The
Info
is not interpreted in any way by the supervisor.
Name
is an internal name, which is used by the
supervisor to identify its children.
Modules
is used for the code change procedure. It
should be dynamic
if the modules that the child uses
can change dynamically at runtime, for example a
gen_event
process. (Note that this refers to the
names of the modules rather than the implementation of the
module.) Otherwise, it should be a list of the module with
which the child is implemented, This information is used by
the release handler to find all processes which execute a
module. For example, if the child is a gen_server
,
Modules
is a list with the name of the callback
module as its only element.
The Shutdown
value infinity
must be used with
care. The supervisor tries to shut down the child by calling
exit(Child, shutdown)
and waits for the child to
terminate. If the child does not terminate, the supervisor
will hang forever. infinity
should be used for
children which themselves are supervisors, but it is not
allowed for workers. This is to make sure that the system
can be shut down without hanging forever.
If the supervisor is a simple_one_for_one
supervisor, this function should be called as
start_child(Supervisor, ExtraStartArgs)
. It starts a
new child of the same type and calls the child's start
function as apply(M, F, A ++ ExtraStartArgs)
. M
,
F
, and A
are returned from the supervisor's
init
function. The new child does not get a unique
name by which is identified in the supervisor. Therefore,
the functions terminate_child/2
,
delete_child/2
and restart_child/2
cannot be
used for a simple_one_for_one
supervisor. When a
temporary
child dies for any reason or a
transient
child dies normally, the child is removed
from the supervisor. Compare this with a ordinary
supervisor, where the child specification remains until
delete_child/2
is called. No progress report is
generated when the child is started. This is to reduce
overheads.
terminate_child(Supervisor, Name) -> ok | {error,
not_found}
Supervisor = pid() | SupName | {global, SupName}
SupName = atom()
Name = term()
Terminates a child. The child is not removed from the
supervisor's set of children. This means that it can be
restarted explicitly by calling restart_child/2
, or
started implicitly if the supervisor has to restart all
children.
delete_child(Supervisor,Name) -> ok | {error, running |
not_found}
Supervisor = pid() | SupName | {global, SupName}
SupName = atom()
Name = term()
Deletes a child from the supervisor. The child must be terminated.
Supervisor = pid() | SupName | {global, SupName}
SupName = atom()
Name = term()
Info = term()
Starts a child which has been terminated and not restarted
according to the restart specification. This can include a
temporary child which terminates, or a child that was
terminated explicitly by calling the function
terminate_child/2
.
which_children(Supervisor) -> [{Name, Pid, Type,
Modules}]
Supervisor = pid() | SupName | {global, SupName}
SupName = atom()
Name = term()
Pid = pid() | undefined
Type = worker | supervisor
Modules = [atom()] | dynamic
Returns a list of the supervisor's children. Name
,
Type
and Modules
are as defined in the child
specification.
check_childspecs([ChildSpec]) -> ok | {error, Reason}
ChildSpec = child_spec()
Checks if a list of child specifications are syntactically correct.
The following functions
should be exported from a supervisor
callback module.
Module:init(StartArgs) -> {ok, {SupFlags,
[ChildSpec]}} | ignore | {error, Reason}
SupFlags = {restart_strategy(), MaxR, MaxT}
restart_strategy() = one_for_all | one_for_one |
rest_for_one | simple_one_for_one
MaxR = int() >= 0
MaxT = int() > 0
ChildSpec = child_spec()
This function returns a supervisor
specification. ChildSpec
is as previously defined in
the start_child/2
function. MaxR
is the
maximum number of restarts which can be performed within
MaxT
seconds.
When the restart strategy is simple_one_for_one
, the
list of child specifications must be a list with one element
only. This child is not started during the initialization
phase, but all children are started dynamically. Each
dynamically started child is of the same type, which means
that all children are instances of the initial child
specification. New children are created with a call to
start_child(Supervisor, ExtraStartArgs)
.
If a child start function returns ignore
, the child
is kept in the supervisor's list of children. The child can
be restarted explicitly by calling restart_child/2
.
The child is also restarted if the supervisor is
one_for_all
and performs a restart of all children,
or if the supervisor is rest_for_one
and performs a
restart of this child. The supervisor start-up fails and
terminates if the child start function returns {error,
Reason}
This function can return ignore
in
order to inform the parent, especially if it is another
supervisor, that the supervisor is not started according to
configuration data, for instance.
The supervisor behaviour generates the same system events as
the gen_server
behaviour. System events are handled by the
sys
module.
gen_server(3), sys(3)