[erlang-questions] Re: Supervisors as factories *and* registries

Jay Nelson jay@REDACTED
Tue Mar 23 19:09:53 CET 2010


On Mar 23, 2010, at 10:46 AM, Garrett Smith wrote:
>
> Yes, though this machinery is my motivation for using supervisors to
> handle the process life cycle house keeping in the first place.

That may be the problem.  Supervisors are a design construct intended  
to manage the restart startegy of processes automatically.  As a  
"separation of concerns" and a reliability issue, it should serve no  
other purpose.  There is the side-effect of organizing the processes  
so that something like appmon can navigate the system, but it is an  
artifact of what supervision trees are intended to accomplish.

>>  Maybe now the experience with non-telco problem areas can
>> advise the development of a larger family of process supervision  
>> models.
>
> I'm curious why this doesn't come up more often :)

Discovery of processes should not be generally available, lest an  
interloper accidentally send the wrong message (although locking it  
down makes it hard to diagnose operational errors).  In general, an  
architecture makes choices about the availability and access to  
processes and uses other mechanisms to provide that access.  You are  
bending the supervision of processes to include a role as a factory  
and access mechanism and I believe that is bundling too many roles  
into a critical element of a system.

> - Custom start and restart behavior
> - Other housekeeping associated with process life cycle management
>
> It seems that a handful of behavior callbacks would enable this, but I
> may be over simplifying things.

Now you're on the right track.  These two things should be within the  
realm of a supervisor.  Keeping to the bare minimum number of  
callbacks is ideal because there is less chance of a supervisor  
failing because of programmer error.

Supervisor should have the following goals:

1) Manage the restart strategy of processes so that the system runs  
continuously

2) Propagate failures so that restart logic can be layered  
dynamically and failure can be routed around at higher levels of the  
system

3) Minimal functionality and API so that programmer error is less likely

4) Complete set of features to allow automated reaction to process  
life cycle events

5) Ability to observe and augment supervisor functionality for  
parallel mechanisms to implement additional life cycle event responses


The last point suggests to me a pairing of supervisor + gen_event  
might be more useful than exposing the State of supervisor as an  
architectural choice.  Events can be emitted without risk of faulting  
the supervisor, they can be ignored or collected, and the reaction  
can be managed in a separate isolated process hierarchy.


jay



More information about the erlang-questions mailing list