[erlang-patches] Supervisor shutdown reason when reaching max restarts

Siri Hansen erlangsiri@REDACTED
Wed Apr 16 14:54:22 CEST 2014


Hello Tobias and the list!

This review has finally made its way up the backlog. Thanks for your
patience!

Is this use case still relevant?

I've started looking at the compatibility issue. As far as I can see there
shouldn't be a big problem upwards in a normal supervision tree, since this
is internal to OTP. Downwards, however, there might be problems since exit
trapping children will get {shutdown,Reason} instead of shutdown when a
supervisor terminates due to a failed restart attempt. In the case of a
gen_server or gen_fsm, this exit reason will be propagated to the callback
module via Mod:terminate/2. The value 'shutdown', used when the supervisor
orders termination, is documented for this callback function. For other
proc_lib processes the receive statement is implemented individually, so
even here the change will be exposed to the user (if matching the exit
reason).

Links and monitors across the supervision tree might also be a problem, of
course. And the case when something other than a supervisor or the
application_master is the parent of a supervisor - I don't know if this is
very common, though, and I find the incompatibility downwards in the
supervision tree most alarming.

As you know, we need to be very careful with backwards incompatible changes
in core functionality, so for the time being we would like to investigate
the general need for this functionality and if there are other possible
ways to go.

Some questions for the list:

1. Is the use case described in the Tobias' previous mail a common one? If
yes, how have you solved it?

2. Do you normally care about the exit reason in your Mod:terminate/2
function or in your proc_lib receive statement when getting an EXIT from
the parent?

3. Any other comments on this?

Thanks
/siri



2013-08-23 14:17 GMT+02:00 Siri Hansen <erlangsiri@REDACTED>:

> Tobias, thanks for a good explanation. I will post this into the ticket
> and we'll take it into account when finishing the review.
> Thanks again for your contribution!
> /siri
>
>
> 2013/8/23 Tobias Schlager <Tobias.Schlager@REDACTED>
>
>>  Hi Siri,
>>
>> glad to hear from you, I'll try to do my best to explain the use case I
>> have in mind.
>>
>> Consider you have a one_for_one (or simple_one_for_one) supervisor A with
>> a worker child B that dynamically adds children to A (using
>> supervisor:start_child/2). Now consider these children also are supervisors
>> of type C with various statically configured workers. I now would like to
>> monitor supervisors of type C from worker B to be able to take some action
>> when *something goes wrong* at one of the C supervisors (e.g. C crashed
>> because of one of its subworkers). However, I can't differentiate between
>> 'something went wrong' or a supervisor C just exited gracefully (e.g. the
>> application was stopped) because supervisors only exit with reason normal
>> or shutdown. It is arguable whether to use another restart type for C
>> supervisors in order to propagate the exit. However, I don't want to crash
>> the whole supervision just to be able to tell that something failed to
>> restart somewhere down the supervision path.
>>
>> In general, the new exit reasons are visible to all processes linked with
>> supervisors or monitoring them (so parent supervisors as well as the
>> application master will see these reasons). This is why I chose the
>> '{shutdown, Reason}' format, which must be supported (according to the
>> documentation this is considered a normal exit reason). Thus, changing the
>> exit reasons will not affect the behaviour of supervision hierarchies
>> (verified by the test suite) or the application master (as far as I can
>> tell). The backward incompatibilty is located in processes depending on the
>> undocumented behaviour of supervisors always exiting with normal or
>> shutdown and not with '{shutdown, Reason}'.
>>
>> I hope, that my explanation makes things a bit clearer (and not worse).
>>
>> Regards
>> Tobias
>>
>>   ------------------------------
>> *Von:* Siri Hansen [erlangsiri@REDACTED]
>> *Gesendet:* Freitag, 23. August 2013 09:50
>> *An:* Tobias Schlager
>> *Cc:* erlang-patches@REDACTED
>> *Betreff:* Re: [erlang-patches] Supervisor shutdown reason when reaching
>> max restarts
>>
>>   Hi Tobias!
>> Thank you for the patch. We have discussed this on OTP Technical Board,
>> and have come to the conclusion that some more investigation is needed of
>> the potential backwards incompatibility. I have written a ticket and the
>> job will be prioritized into our backlog. Unfortunately we won't make it
>> before the next release (R16B02).
>>
>>  In order to help us a bit on the way, could you please provide some
>> more information about your use case? You say that you are monitoring the
>> supervisor from another process, do you mean other process than the
>> supervisor's supervisor? If so, could you explain this architecture a bit
>> more?
>>
>>  Who else will see this exit reason? - application_master? - the parent
>> supervisors? - other?
>>
>>  Thanks again!
>> Regards
>> /siri
>>
>>
>>
>> 2013/7/4 Tobias Schlager <Tobias.Schlager@REDACTED>
>>
>>> Hi,
>>>
>>> this patch changes the behaviour of supervisors to exit with a more
>>> specific reason when exiting due to a maximum restart limit hit. This is
>>> especially useful (or even necessary) to distinguish between normal and
>>> erroneous process terminations when monitoring a supervisor from another
>>> process.
>>>
>>> In the above case a supervisor would now exit with {shutdown,
>>> {reached_max_restart_intensity, Child}} where Child is whatever is
>>> available to describe the child, either a child id or in case of a
>>> simple_one_for_one supervisor the offending child's process id. The patch
>>> should not affect the OTP restart behaviour (also for cascaded supervisors)
>>> since a subclass of 'normal' exit reasons is used.
>>>
>>> I'm aware that there is some potential backward incompatibility for
>>> people that do not expect {shutdown, Reason} when monitoring a supervisor.
>>> However, the feature of exiting normally with {shutdown, Reason} has been
>>> around for quite a while now and I think this could be a sensible place to
>>> use it. Let me know what you think.
>>>
>>> The patch does include tests and updated documentation.
>>>
>>>           git fetch https://github.com/schlagert/otp.gitsupervisor_shutdown_reason
>>>
>>>
>>> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
>>>
>>> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch
>>>
>>> Regards
>>> Tobias
>>> _______________________________________________
>>> erlang-patches mailing list
>>> erlang-patches@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-patches
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20140416/da5248ca/attachment.htm>


More information about the erlang-patches mailing list