[erlang-questions] Custom supervisor

Sat Jan 17 16:45:02 CET 2009

On Sat, Jan 17, 2009 at 8:49 AM, Paul Guyot <pguyot@REDACTED> wrote:
> Hello,
>
> We have worker processes that are identified by a key and on each
> node, a manager that provides the pid of the worker process from the
> key, spawning a new worker if required. The key is independent from
> the node. The workers may crash and if they do, they should be
> restarted, being told they are restarted as a parameter of the init
> callback function, and when queried, the manager will return the new
> pid. The workers may also be stopped and started on another node (i.e.
> migrated). The workers are simple gen_servers.
>
> To perform this, we wrote the manager as a gen_server that traps exits
> and spawn processes and that stores the key/pid mapping in a mnesia
> table. However, I am wondering if the workers are visible in the OTP
> supervision tree since they are children of a worker and not of a
> supervisor. If we declare to the manager's supervisor that the manager
> is a supervisor (and not a worker), this has consequences I do not
> fully understand. I figured out that the manager will receive a
> which_children message during a release update, but that's all I know.
>
> We did update the worker's code, but we only needed {load_module,
> worker} and not {update, worker, {advanced, extra}}, i.e. we did not
> need to call code_change yet. Still, I guess that Module:code_change
> is only called when the module is in the supervision tree.
>
> Is there a more OTP-way to perform this, i.e. to track the death/
> restarts of the children of a supervisor?
> What are the consequences of having worker or supervisor in the
> supervisor's child specifications?
>
> Paul

I've used the approach of a top supervisor with two children, one
manager and one worker supervisor. The manager gets requests for
creating a new worker and lets the worker supervisor start it.

My main supervisor thus starts these two children:

{{one-for-all, 1, 60},
    [
    {manager, {manager, start_link, []}, permanent, 60, worker, [manager]},
    {work_super, {work_super, start_link, []}, permanent, infinity,
supervisor, [work_super]}
    ]
}

And the work supervisor is just {{one_for_one, 5, 60}, []}

In my manager, I then add new workers using the work_super:

start_child(Id, Args) ->
	case supervisor:start_child(work_super,
			{Id, {work, start_link, Args},
			transient, 60*1000, worker, [work]}) of
		{ok, _Child} ->
			ok;
		{ok, _Child, _Info} ->
			ok;
		{error, {already_started, _Child}} ->
			ok;
		Error ->
			Error
	end.

And I can stop it

stop_child(Id) ->
	case supervisor:terminate_child(work_super, Id) of
		ok -> supervisor:delete_child(work_super, Id);
		Error -> Error
	end.

I am not yet 100% sure if I am interpreting the various start_child
results correctly. But this works well for me at the moment.

Robby