[erlang-questions] handling crash on db connect
Paul Rubin
paul@REDACTED
Thu Jun 6 18:35:45 CEST 2013
On Thu, Jun 6, 2013 at 8:58 AM, Garrett Smith <g@REDACTED> wrote:
> Hi Paul,
> Sorry I'm not used to seeing questions about my Redis bindings
> graduate to the main list :)
>
Hey, thanks for the reply. I asked on the list first because I didn't want
to bug you on github until I was more confident that I wasn't doing
something silly.
> redis:connect is a wrapper for the process start_link. The return
> value is standard for an OTP process: either {ok, Pid} or {error,
> Reason}.
>
Yes, this is what I expected, but in fact when I say
A = redis:connect().
when the redis server is not running, I get a crash instead of
{error,Reason}. Is that what's supposed to happen? I'm still not clear on
this, as it conflicts with what you say further down. What's the point of
returning {ok, Pid} instead of just Pid in the non-error case, if there's
no possibility of returning anything without ok, in the error case?
> Not a dumb error at all -- but neither a bug. This is by design and is
> pretty common for OTP processes. In particular, risky code gets
> executed in the context of the process (for the sake of proper
> isolation) and calling processes need to trap exit to deal with
> problems, or just let it all crash and get restarted by the
> supervisor.
>
OK, I guess I can trap exit, but I had thought of trapexit itself as being
risky and generally best left to the OTP supervision libraries except for
special circumstances, and a database being down is relatively normal. The
"let it crash" approach would be
{ok, Pid} = redis:connect().
which would crash with a pattern match failure in the case of an error
return.
>
> The question of how to handle connection problems can be tricky. I
> typically bake this into a "connection handler" type process that
> indeed traps exit and then figures out what to do -- other simply
> let's the client process exit propagate up to the supervisor. I'll
> typically have a retry logic that waits for a period of time after
> failures, logging attempts, errors, etc.
>
Yes, that seems like the right thing, retry every few seconds until the db
is back. I just hadn't thought of trapping exit as part of it (as opposed
to just checking for error value). I actually do have a separate gen_server
making a persistent connection in its init/1 and holding onto it, and then
other parts of my program call this gen_server which in turn makes the
redis call. When the init crashes, my top supervisor restarts the
gen_server immediately, this repeats until MaxR runs out, and the whole VM
crashes. It came as a shock that a fairly routine error case could cause
this to happen. I find myself wishing for a general additional OTP
supervision strategy (one_for_one_delay, say) that on crash would attempt
restart no more than once per retry period (e.g. 1 second). I had kind of
thought that was what MaxT does, but I guess not.
>
> I can provide an example of this type of process -- or maybe something
> like this would be appropriate as a utility within the library
This would be great, thanks!
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130606/99f70798/attachment.htm>
More information about the erlang-questions
mailing list