[erlang-patches] Supervisor shutdown reason when reaching max restarts

Tobias Schlager <>
Wed Apr 16 15:49:20 CEST 2014


Hi Siri,

I already solved the problem for my use case. In my case the supervisors of type C never exit gracefully except for the case the application gets stopped, so I ended up 'disarming' the monitors of process B in an 'application:prep_stop/1' callback.

Of course, this is a debatable patch because it introduces backwards incompatible behavior and I will not argue if it gets declined. I just thought that having differentiated exit reason of supervisors would be a good thing to have in general (and that according to the documentation {shutdown, Reason} exits must be supported by callback implementations anyways).

Regards
Tobias

--

Lindenbaum GmbH
Dipl.-Inform. Tobias Schlager, Software Engineer
Erbprinzenstr. 4-12, 76133 Karlsruhe, Deutschland
Tel: +49 721 48 08 48 - 000, Fax: +49 721 48 08 48 - 801,
E-Mail: 

Firmensitz: Erbprinzenstr. 4-12, Eingang A, 76133 Karlsruhe
Registergericht: Amtsgericht Mannheim, HRB 706184
Geschäftsführer: Dr. Ralf Nikolai
Steuernummer: 35007/02060, USt. ID: DE 263797265

http://www.lindenbaum.eu


________________________________
Von: Siri Hansen []
Gesendet: Mittwoch, 16. April 2014 14:54
An: Tobias Schlager
Cc: 
Betreff: Re: [erlang-patches] Supervisor shutdown reason when reaching max restarts

Hello Tobias and the list!

This review has finally made its way up the backlog. Thanks for your patience!

Is this use case still relevant?

I've started looking at the compatibility issue. As far as I can see there shouldn't be a big problem upwards in a normal supervision tree, since this is internal to OTP. Downwards, however, there might be problems since exit trapping children will get {shutdown,Reason} instead of shutdown when a supervisor terminates due to a failed restart attempt. In the case of a gen_server or gen_fsm, this exit reason will be propagated to the callback module via Mod:terminate/2. The value 'shutdown', used when the supervisor orders termination, is documented for this callback function. For other proc_lib processes the receive statement is implemented individually, so even here the change will be exposed to the user (if matching the exit reason).

Links and monitors across the supervision tree might also be a problem, of course. And the case when something other than a supervisor or the application_master is the parent of a supervisor - I don't know if this is very common, though, and I find the incompatibility downwards in the supervision tree most alarming.

As you know, we need to be very careful with backwards incompatible changes in core functionality, so for the time being we would like to investigate the general need for this functionality and if there are other possible ways to go.

Some questions for the list:

1. Is the use case described in the Tobias' previous mail a common one? If yes, how have you solved it?

2. Do you normally care about the exit reason in your Mod:terminate/2 function or in your proc_lib receive statement when getting an EXIT from the parent?

3. Any other comments on this?

Thanks
/siri



2013-08-23 14:17 GMT+02:00 Siri Hansen <<mailto:>>:
Tobias, thanks for a good explanation. I will post this into the ticket and we'll take it into account when finishing the review.
Thanks again for your contribution!
/siri


2013/8/23 Tobias Schlager <<mailto:>>
Hi Siri,

glad to hear from you, I'll try to do my best to explain the use case I have in mind.

Consider you have a one_for_one (or simple_one_for_one) supervisor A with a worker child B that dynamically adds children to A (using supervisor:start_child/2). Now consider these children also are supervisors of type C with various statically configured workers. I now would like to monitor supervisors of type C from worker B to be able to take some action when *something goes wrong* at one of the C supervisors (e.g. C crashed because of one of its subworkers). However, I can't differentiate between 'something went wrong' or a supervisor C just exited gracefully (e.g. the application was stopped) because supervisors only exit with reason normal or shutdown. It is arguable whether to use another restart type for C supervisors in order to propagate the exit. However, I don't want to crash the whole supervision just to be able to tell that something failed to restart somewhere down the supervision path.

In general, the new exit reasons are visible to all processes linked with supervisors or monitoring them (so parent supervisors as well as the application master will see these reasons). This is why I chose the '{shutdown, Reason}' format, which must be supported (according to the documentation this is considered a normal exit reason). Thus, changing the exit reasons will not affect the behaviour of supervision hierarchies (verified by the test suite) or the application master (as far as I can tell). The backward incompatibilty is located in processes depending on the undocumented behaviour of supervisors always exiting with normal or shutdown and not with '{shutdown, Reason}'.

I hope, that my explanation makes things a bit clearer (and not worse).

Regards
Tobias

________________________________
Von: Siri Hansen [<mailto:>]
Gesendet: Freitag, 23. August 2013 09:50
An: Tobias Schlager
Cc: <mailto:>
Betreff: Re: [erlang-patches] Supervisor shutdown reason when reaching max restarts

Hi Tobias!
Thank you for the patch. We have discussed this on OTP Technical Board, and have come to the conclusion that some more investigation is needed of the potential backwards incompatibility. I have written a ticket and the job will be prioritized into our backlog. Unfortunately we won't make it before the next release (R16B02).

In order to help us a bit on the way, could you please provide some more information about your use case? You say that you are monitoring the supervisor from another process, do you mean other process than the supervisor's supervisor? If so, could you explain this architecture a bit more?

Who else will see this exit reason? - application_master? - the parent supervisors? - other?

Thanks again!
Regards
/siri



2013/7/4 Tobias Schlager <<mailto:>>
Hi,

this patch changes the behaviour of supervisors to exit with a more specific reason when exiting due to a maximum restart limit hit. This is especially useful (or even necessary) to distinguish between normal and erroneous process terminations when monitoring a supervisor from another process.

In the above case a supervisor would now exit with {shutdown, {reached_max_restart_intensity, Child}} where Child is whatever is available to describe the child, either a child id or in case of a simple_one_for_one supervisor the offending child's process id. The patch should not affect the OTP restart behaviour (also for cascaded supervisors) since a subclass of 'normal' exit reasons is used.

I'm aware that there is some potential backward incompatibility for people that do not expect {shutdown, Reason} when monitoring a supervisor. However, the feature of exiting normally with {shutdown, Reason} has been around for quite a while now and I think this could be a sensible place to use it. Let me know what you think.

The patch does include tests and updated documentation.

          git fetch https://github.com/schlagert/otp.git supervisor_shutdown_reason

          https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
          https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch

Regards
Tobias
_______________________________________________
erlang-patches mailing list
<mailto:>
http://erlang.org/mailman/listinfo/erlang-patches



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20140416/6f53e018/attachment-0001.html>


More information about the erlang-patches mailing list