[erlang-questions] to supervise or not to supervise
steve ellis
steve.e.123@REDACTED
Sun Mar 22 17:24:44 CET 2009
I just realized that the process that spawns my standalone supervisors would
linked by default to the supervisors through its call to start_link to start
the supervisors in the first place. So when a supervisor dies because it has
reached its max restarts, the calling gen_server process will get an exit
signal in its handle_info callback of {'EXIT', DeadSupervisorPid,
reached_max_restart_intensity}. This is basic error handling stuff and it is
where i would write my code to do something with the error.
And now as I read the docs on handle_info/2 i see that that is where all
system messages get sent which seems to answer my other question.
So I think I'm on the right track. Please someone let me know if I'm missing
something. Thanks!
Steve
On Sun, Mar 22, 2009 at 10:58 AM, steve ellis <steve.e.123@REDACTED> wrote:
> Thanks Lennart and Mihai! Very helpful information. Lennart it's good to
> know about the intent behind supervisor's orignial design.
>
> I like Mihai's suggestion of having one supervisor supervise each process.
> This would get us most of the way there and it would be easy to implement.
>
> But is there any way in OTP to see when a supervisor reaches its max
> restarts? I know this is logged by the sasl error logger. But how would I
> trap/detect this event in my code to do something with it?
>
> It doesn't look like supervisor has a function like gen_server's handy
> terminate/2.
>
> Maybe it would make more sense in our case to have one gen_server process
> monitor a child gen_server process. The child could call a function in the
> parent when it terminates. This way we'd have access to the terminate
> function of the monitoring/supervising gen_server. The problem with this
> though is that we'd have to implement our own restart strategy behavior,
> which is what is so great about supervisor.
>
> This might be related to something more general that I've been wondering
> about (which I should post as a question in a new thread). How to tap into
> the sasl error logger so my system can do stuff with those events. For
> example I'd like to send these events to another machine via tcp.
>
> Thanks!
>
> Steve
>
>
> On Fri, Mar 20, 2009 at 5:29 PM, Mihai Balea <mihai@REDACTED> wrote:
>
>>
>> On Mar 20, 2009, at 3:42 PM, steve ellis wrote:
>>
>> New to supervision trees and trying to figure out when to use them (and
>>> when not to)...
>>>
>>> I have bunch of spawned processes created through spawn_link. Want these
>>> processes to say running indefinitely. If one exits in an error state, we
>>> want to restart it N times. After N, we want to error log it, and stop
>>> trying to restart it. Perfect job for a one_to_one supervisor right?
>>>
>>> Well sort of. The problem is that when the max restarts for the error
>>> process is reached, the supervisor terminates all its children and itself.
>>> Ouch! (At least in our case). We'd rather that the supervisor just keep
>>> supervising all the children that are ok and not swallow everything up.
>>>
>>> The Design Principles appear to be saying that swallowing everything up
>>> is what supervisors are supposed to do when max restarts is reached which
>>> leaves me a little puzzled. Why would you want to kill the supervisor just
>>> because a child process is causing trouble? Seems a little harsh.
>>>
>>> Is this a case of me thinking supervisors are good for too many things?
>>> Is it that our case is better handled by simply spawning these processes and
>>> trapping exits on them, and restarting/error logging in the trap exit?
>>>
>>
>> As far as I know, the standard supervisor cannot behave the way you want
>> it to.
>>
>> So, at least until this type of behavior is added to the standard
>> supervisor, you can work around it with double layers of supervision.
>> Basically have one dedicated supervisor for each process you want to
>> supervise and, in turn, each dedicated supervisor is set up as a transient
>> child to one big supervisor.
>>
>> Mihai
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090322/b06f18c5/attachment.htm>
More information about the erlang-questions
mailing list