[erlang-questions] handling crash on db connect

Thu Jun 6 18:35:45 CEST 2013

On Thu, Jun 6, 2013 at 8:58 AM, Garrett Smith <g@REDACTED> wrote:

> Hi Paul,
> Sorry I'm not used to seeing questions about my Redis bindings
> graduate to the main list :)
>

Hey, thanks for the reply.  I asked on the list first because I didn't want
to bug you on github until I was more confident that I wasn't doing
something silly.

> redis:connect is a wrapper for the process start_link. The return
> value is standard for an OTP process: either {ok, Pid} or {error,
> Reason}.
>

Yes, this is what I expected, but in fact when I say
    A = redis:connect().
when the redis server is not running, I get a crash instead of
{error,Reason}.  Is that what's supposed to happen?  I'm still not clear on
this, as it conflicts with what you say further down.  What's the point of
returning {ok, Pid} instead of just Pid in the non-error case, if there's
no possibility of returning anything without ok, in the error case?

> Not a dumb error at all -- but neither a bug. This is by design and is
> pretty common for OTP processes. In particular, risky code gets
> executed in the context of the process (for the sake of proper
> isolation) and calling processes need to trap exit to deal with
> problems, or just let it all crash and get restarted by the
> supervisor.
>

OK, I guess I can trap exit, but I had thought of trapexit itself as being
risky and generally best left to the OTP supervision libraries except for
special circumstances, and a database being down is relatively normal.  The
"let it crash" approach would be
   {ok, Pid} = redis:connect().
which would crash with a pattern match failure in the case of an error
return.

>
> The question of how to handle connection problems can be tricky. I
> typically bake this into a "connection handler" type process that
> indeed traps exit and then figures out what to do -- other simply
> let's the client process exit propagate up to the supervisor. I'll
> typically have a retry logic that waits for a period of time after
> failures, logging attempts, errors, etc.
>

Yes, that seems like the right thing, retry every few seconds until the db
is back.  I just hadn't thought of trapping exit as part of it (as opposed
to just checking for error value). I actually do have a separate gen_server
making a persistent connection in its init/1 and holding onto it, and then
other parts of my program call this gen_server which in turn makes the
redis call.  When the init crashes, my top supervisor restarts the
gen_server immediately, this repeats until MaxR runs out, and the whole VM
crashes.  It came as a shock that a fairly routine error case could cause
this to happen.  I find myself wishing for a general additional OTP
supervision strategy (one_for_one_delay, say) that on crash would attempt
restart no more than once per retry period (e.g. 1 second).  I had kind of
thought that was what MaxT does, but I guess not.

>
> I can provide an example of this type of process -- or maybe something
> like this would be appropriate as a utility within the library

This would be great, thanks!

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130606/99f70798/attachment.htm>