[erlang-questions] supervisor with obstinate restart policy — are there any implementations?

Richard Carlsson carlsson.richard@REDACTED
Wed Jun 27 21:18:00 CEST 2012


On 06/26/2012 12:43 PM, Max Lapshin wrote:
> Hi.
>
> I think that many people have met with OTP supervisor problem: if your
> supervisor must work with external resource,
> and this resource is down, you get system, brought down after some restarts.
>
> I think that there are many implementations of trackers, that restart
> such jobs and thus reimplement OTP supervisors.
>
> Have anyone implemented supervisor that is OTP compatible, and doesn't
> fail on frequent worker restarts,
> but starts to restart less and less frequent?

I did some work previously on adding incremental backoff to the OTP 
supervisors, but in this case you describe, I think that what you need 
is not a special supervisor, but a Circuit Breaker (see 
http://en.wikipedia.org/wiki/Circuit_breaker_design_pattern). The idea 
with supervision is that it is often the case that a restart will fix 
temporary problems and glitches, by resetting the workers to a known 
good state. But when it comes to depending on external resources, your 
supervisor cannot restart the external resource - it can only restart 
your connection to the resource. If that wasn't the problem, you're 
still screwed.

One way of implementing a circuit breaker in Erlang is as a separate 
server that acts as a middleman. Anybody who wants to call the external 
service has to make the request via the circuit breaker. The circuit 
breaker runs the jobs, tracks status of jobs and detects timeouts, logs 
warnings, and can decide to block further requests for a while if the 
external resource seems to be misbehaving, so your logs don't get 
flooded by a million workers simultaneously discovering that your SMS 
provider (or whatever) is unavailable. You should also be able to query 
the circuit breaker about the current status of the resources it 
monitors, force block/unblock, etc.

A generic circuit breaker would be a nice addition to the Erlang 
libraries, but it's not a supervisor - it's a service, which in itself 
needs to be supervised (because the rest of the system depends on it).

     /Richard



More information about the erlang-questions mailing list