[erlang-questions] supervisor with obstinate restart policy — are there any implementations?
Thu Jun 28 11:09:14 CEST 2012
On 27 Jun 2012, at 20:18, Richard Carlsson wrote:
> On 06/26/2012 12:43 PM, Max Lapshin wrote:
>> I think that many people have met with OTP supervisor problem: if your
>> supervisor must work with external resource,
>> and this resource is down, you get system, brought down after some restarts.
>> I think that there are many implementations of trackers, that restart
>> such jobs and thus reimplement OTP supervisors.
>> Have anyone implemented supervisor that is OTP compatible, and doesn't
>> fail on frequent worker restarts,
>> but starts to restart less and less frequent?
> I did some work previously on adding incremental backoff to the OTP supervisors, but in this case you describe, I think that what you need is not a special supervisor, but a Circuit Breaker (see http://en.wikipedia.org/wiki/Circuit_breaker_design_pattern). The idea with supervision is that it is often the case that a restart will fix temporary problems and glitches, by resetting the workers to a known good state. But when it comes to depending on external resources, your supervisor cannot restart the external resource - it can only restart your connection to the resource. If that wasn't the problem, you're still screwed.
There is a simple back-off (after delay) implementation available in http://hg.rabbitmq.com/rabbitmq-server/file/default/src/supervisor2.erl as well fyi.
More information about the erlang-questions