[erlang-questions] Re: Supervisors as factories *and* registries
Jay Nelson
jay@REDACTED
Tue Mar 23 19:09:53 CET 2010
On Mar 23, 2010, at 10:46 AM, Garrett Smith wrote:
>
> Yes, though this machinery is my motivation for using supervisors to
> handle the process life cycle house keeping in the first place.
That may be the problem. Supervisors are a design construct intended
to manage the restart startegy of processes automatically. As a
"separation of concerns" and a reliability issue, it should serve no
other purpose. There is the side-effect of organizing the processes
so that something like appmon can navigate the system, but it is an
artifact of what supervision trees are intended to accomplish.
>> Maybe now the experience with non-telco problem areas can
>> advise the development of a larger family of process supervision
>> models.
>
> I'm curious why this doesn't come up more often :)
Discovery of processes should not be generally available, lest an
interloper accidentally send the wrong message (although locking it
down makes it hard to diagnose operational errors). In general, an
architecture makes choices about the availability and access to
processes and uses other mechanisms to provide that access. You are
bending the supervision of processes to include a role as a factory
and access mechanism and I believe that is bundling too many roles
into a critical element of a system.
> - Custom start and restart behavior
> - Other housekeeping associated with process life cycle management
>
> It seems that a handful of behavior callbacks would enable this, but I
> may be over simplifying things.
Now you're on the right track. These two things should be within the
realm of a supervisor. Keeping to the bare minimum number of
callbacks is ideal because there is less chance of a supervisor
failing because of programmer error.
Supervisor should have the following goals:
1) Manage the restart strategy of processes so that the system runs
continuously
2) Propagate failures so that restart logic can be layered
dynamically and failure can be routed around at higher levels of the
system
3) Minimal functionality and API so that programmer error is less likely
4) Complete set of features to allow automated reaction to process
life cycle events
5) Ability to observe and augment supervisor functionality for
parallel mechanisms to implement additional life cycle event responses
The last point suggests to me a pairing of supervisor + gen_event
might be more useful than exposing the State of supervisor as an
architectural choice. Events can be emitted without risk of faulting
the supervisor, they can be ignored or collected, and the reaction
can be managed in a separate isolated process hierarchy.
jay
More information about the erlang-questions
mailing list