[erlang-questions] handling crash on db connect

Garrett Smith g@REDACTED
Thu Jun 6 19:02:05 CEST 2013


On Thu, Jun 6, 2013 at 11:35 AM, Paul Rubin <paul@REDACTED> wrote:
> Yes, that seems like the right thing, retry every few seconds until the db
> is back.  I just hadn't thought of trapping exit as part of it (as opposed
> to just checking for error value). I actually do have a separate gen_server
> making a persistent connection in its init/1 and holding onto it, and then
> other parts of my program call this gen_server which in turn makes the redis
> call.  When the init crashes, my top supervisor restarts the gen_server
> immediately, this repeats until MaxR runs out, and the whole VM crashes.  It
> came as a shock that a fairly routine error case could cause this to happen.
> I find myself wishing for a general additional OTP supervision strategy
> (one_for_one_delay, say) that on crash would attempt restart no more than
> once per retry period (e.g. 1 second).  I had kind of thought that was what
> MaxT does, but I guess not.

I missed this part in my previous reply and I think it deserves some comment.

What you observed I think is a very healthy problem -- a surprising
catastrophic failure! This is a thing of beauty because it calls
attention to a serious problem: you're relying on something that
suddenly isn't working. Rather than lure you into a false sense of
confidence, Erlang's default answer is to STOP. Now what? Dunno, but
it got your attention :)

If you start to look at your Erlang applications as ecosystems of
independent services, you can start to think about shoring up each
service to improve its availability, performance, etc. -- just as one
might in a service oriented architecture. In the case of your Redis
service, you want something that advocates for your Redis DB
availability. That advocate (an Erlang process) can deal with
connections, retries, error handling, etc. as an intelligent facade to
the underlying Redis client process (which can some and go at any time
depending on network, Redis server availability, etc.)

I'll highlight this idea in the example I provide for this connection problem.

Garrett



More information about the erlang-questions mailing list