[erlang-questions] Supervisor post start update of restart intensity and period

Tue Oct 20 15:01:36 CEST 2015

On 2015年10月20日 火曜日 12:33:34 Michael Wright wrote:
> Hi Torben,
> 
> I did wonder about this as a solution, but I'm not terribly keen.
> 
> Take the case of 10 sup_10 supervisors with a restart intensity of 10, each
> with 10 children. If there are 11 child deaths for children concentrated on
> one of those supervisors, it will trigger a sup_10 restart, but if the 11
> children that die are distributed across 2 or more sup_10 supervisors, it
> won't... The sup_10 restart probably isn't a problem of course, but the
> number of total deaths in a period of time that will cause a sup_sup to
> restart is now variable, depending on exactly which of the children across
> the sup_10 supervisors die.
> 
> In fact, in this situation, 11 child deaths could cause a sup_10 death, or
> 100 child deaths could just about cause no sup_10 to die.

With your initial post I thought "hrm, that is sort of odd that it isn't dynamically configurable" but the only scenarios I could think of off-hand for actual systems I would maybe actually use this were ones where I want precisely the sort of isolation you view as problematic.

As it stands, Torben's suggestion where a sup_sup can spawn dynamically configurable supervisors seems ideal -- especially considering that I could retire an existing sup (with the "wrong" configuration) and direct all new child creation to the new one (with the "right" configuration) -- and, hot updates aside, probably smoothly transition a running process' state to a new process under the new supervisor. There could easily be edge cases where that wouldn't work, but the general case seems straightforward.

It would be nice to abstract this all away for the general case, of course, and that doesn't seem to require making any adjustments to OTP.

But I lack imagination. In what case would this not work?

-Craig