[erlang-questions] handling crash on db connect

Fri Jun 7 21:32:44 CEST 2013

On 6 Jun 2013, at 20:01, Paul Rubin <paul@REDACTED> wrote:
> 
> I wonder if what's really going on here is a gap in the available supervision strategies.  I was a bit surprised to learn from the supervisor docs,
> 
> "To prevent a supervisor from getting into an infinite loop of child process terminations and restarts, a maximum restart frequency is defined using two integer values MaxR and MaxT. If more than MaxR restarts occur within MaxT seconds, the supervisor terminates all child processes and then itself."
> 
> I had somehow thought that if MaxR restarts happened in MaxT seconds, the supervisor would just sleep until MaxT seconds had passed, then start retrying again (i.e. limit the frequency rather than the absolute count).  It does seem to me there should be an option for something like that. 

Whilst this is orthogonal to the points covered already, it might be worth mentioning that RabbitMQ uses its own copy of supervisor, imaginatively named supervisor2, which offers this feature under the name "delayed restarts". A child restart configured with {Type, Delay} will use Type to handle restarts unless it hits the maximum restart intensity, at which point it will try again after Delay. With Type set to permanent for example, that basically means the supervisor will keep trying to restart forever. We use this in some plugins to provide a reconnect/retry delay option.

At some point we could submit a patch to OTP, if there's any appetite for incorporating it.

Cheers,
Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130607/82064d33/attachment.htm>