[erlang-questions] default timeout values for a library (epgsql)

Mon Mar 10 12:29:14 CET 2014

There are two things to consider here:

1. A client is using the database library. Surely, you want some kind of
progress in the client. Progress here means that if you time out, you want
to do something with the client in question. Maybe it should crash. Maybe
it should do something else. There is a *protocol* between the client and
the database driver. Clearly, it should be possible for the client to set a
timeout on the connection. Also, if the client dies and goes away, the
database driver must have a way to clean up after this incident. It can not
be the responsibility of the client to do so.

2. The database driver has a TCP/SSL socket to the database server. This
transport has some kind of progress as well. It might be that you want to
keep the connection alive, or that you know the DB server will send a "I am
still processing goddamit!" back to the client once in a while.

These two cases are orthogonal. You can't just set a 10 second timeout,
because there will be queries which take longer than 10 seconds to even
begin returning result sets (typical if there is a large sort job in the
query planner pipeline). On the other hand, you want clients to be able to
cancel a currently running query so you can break long-running jobs on the
server side.

My hunch is that a good solution is found by:

* Thinking about crash behaviour. What should happen if a given process
crashes?
* Thinking in terms of protocols. What is the client -> driver protocol?
What is the driver -> server protocol?
* Constraining the set of guarantees you give the client. You may have to
trade off certain aspects in order to achieve better performance or
reliability.

The Emysql driver mashed these things together and a lot of problems arise
from the fact that there is no separation of protocols in it. Suddenly, TCP
transport timeouts lingers in the API of the client. Why are they doing
that? All of a sudden, if I set `infinity` as a timeout, the result is that
a client can be blocked ad infinitum. So the timeout has been pushed back
and forth between infinity and some random default value between 3000 and
60000 ms. Also, most Erlang code I have seen written utterly fails handling
this by error handling and uses timeout values as a first-principle for
handling error. The timeout should be a safety measure to ensure a
subsystem has progress. On the other hand, a client which "hangs" on a TCP
session until some other event might kill it is not a problem. As long as
there is *some* event which can ensure progress in the system.

and that was my 2 cents.

On Mon, Mar 10, 2014 at 11:30 AM, David Welton <davidnwelton@REDACTED>wrote:

> Hi,
>
> We've been having something of a debate on the epgsql mailing list
> about timeouts for database queries:
>
> https://groups.google.com/forum/#!topic/epgsql/jmwSCybEN3E
>
> I thought that it'd be interesting to hear how other people handle
> things like this.
>
> One thing that looks like we need is the ability to specify a timeout,
> but, beyond that, I just thought I'd ask a wider group of people what
> their thoughts are on timeouts in library code like this.
>
> Thank you,
> --
> David N. Welton
>
> http://www.welton.it/davidw/
>
> http://www.dedasys.com/
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>

-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140310/567dbd72/attachment.htm>