[erlang-patches] Supervisor shutdown reason when reaching max restarts

Tobias Schlager <>
Fri Aug 23 12:26:28 CEST 2013


Hi Siri,

glad to hear from you, I'll try to do my best to explain the use case I have in mind.

Consider you have a one_for_one (or simple_one_for_one) supervisor A with a worker child B that dynamically adds children to A (using supervisor:start_child/2). Now consider these children also are supervisors of type C with various statically configured workers. I now would like to monitor supervisors of type C from worker B to be able to take some action when *something goes wrong* at one of the C supervisors (e.g. C crashed because of one of its subworkers). However, I can't differentiate between 'something went wrong' or a supervisor C just exited gracefully (e.g. the application was stopped) because supervisors only exit with reason normal or shutdown. It is arguable whether to use another restart type for C supervisors in order to propagate the exit. However, I don't want to crash the whole supervision just to be able to tell that something failed to restart somewhere down the supervision path.

In general, the new exit reasons are visible to all processes linked with supervisors or monitoring them (so parent supervisors as well as the application master will see these reasons). This is why I chose the '{shutdown, Reason}' format, which must be supported (according to the documentation this is considered a normal exit reason). Thus, changing the exit reasons will not affect the behaviour of supervision hierarchies (verified by the test suite) or the application master (as far as I can tell). The backward incompatibilty is located in processes depending on the undocumented behaviour of supervisors always exiting with normal or shutdown and not with '{shutdown, Reason}'.

I hope, that my explanation makes things a bit clearer (and not worse).

Regards
Tobias

________________________________
Von: Siri Hansen []
Gesendet: Freitag, 23. August 2013 09:50
An: Tobias Schlager
Cc: 
Betreff: Re: [erlang-patches] Supervisor shutdown reason when reaching max restarts

Hi Tobias!
Thank you for the patch. We have discussed this on OTP Technical Board, and have come to the conclusion that some more investigation is needed of the potential backwards incompatibility. I have written a ticket and the job will be prioritized into our backlog. Unfortunately we won't make it before the next release (R16B02).

In order to help us a bit on the way, could you please provide some more information about your use case? You say that you are monitoring the supervisor from another process, do you mean other process than the supervisor's supervisor? If so, could you explain this architecture a bit more?

Who else will see this exit reason? - application_master? - the parent supervisors? - other?

Thanks again!
Regards
/siri



2013/7/4 Tobias Schlager <<mailto:>>
Hi,

this patch changes the behaviour of supervisors to exit with a more specific reason when exiting due to a maximum restart limit hit. This is especially useful (or even necessary) to distinguish between normal and erroneous process terminations when monitoring a supervisor from another process.

In the above case a supervisor would now exit with {shutdown, {reached_max_restart_intensity, Child}} where Child is whatever is available to describe the child, either a child id or in case of a simple_one_for_one supervisor the offending child's process id. The patch should not affect the OTP restart behaviour (also for cascaded supervisors) since a subclass of 'normal' exit reasons is used.

I'm aware that there is some potential backward incompatibility for people that do not expect {shutdown, Reason} when monitoring a supervisor. However, the feature of exiting normally with {shutdown, Reason} has been around for quite a while now and I think this could be a sensible place to use it. Let me know what you think.

The patch does include tests and updated documentation.

          git fetch https://github.com/schlagert/otp.git supervisor_shutdown_reason

          https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
          https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch

Regards
Tobias
_______________________________________________
erlang-patches mailing list
<mailto:>
http://erlang.org/mailman/listinfo/erlang-patches

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20130823/c9a11671/attachment.html>


More information about the erlang-patches mailing list