ssl_esock loop bug

Fri Nov 25 11:18:00 CET 2005


We've found another bug in the esock port program.  The attached
program can be used to reproduce the bug.

The problem is this:  if an erlang application writes to the ssl socket
faster than the (remote) client reads, esock will start to buffer in
cp->wq.  Then erlang is done, and closes the socket.  Next thing that
happens is that the remote client also closes, before reading all
data.  This means that the cp->fd gets a POLLHUP, which esock
interprets (here's the bug!) as input to read.  read will return 0,
but since there's still stuff in the write queue, the connection is
left in the JOINED state, and it will poll() again and get POLLHUP

The proper fix is to handle POLLHUP (and POLLERR and POLLNVAL which
are not even handled today) separately from read/write.  First thing
to check is if an error occured, otherwise check read/write.

We almost added this, but there is some strange handling of select()
on w32 which we were unsure about.  We'd be glad to help if needed.

Anyway, here's a small patch which will fix this particular problem,
but there might be more cases lurking around (which proper handling of
POLLHUP etc would fix).

/martin & johan

--- esock.c     (revision 377)
+++ esock.c     (working copy)
@@ -1086,6 +1086,13 @@
                        /* SSL eof */
                        DEBUGF(("SSL eof\n"));
                        cp->eof = 1;
+                       /* kind of temporary hack - proper handling of POLLHUP|
+                          POLLERR|POLLNVAL is needed!
+                          drop the write queue - since SSL is closed we can't
+                          write anyway. this forces JOINED_STATE_INVALID to be
+                          true.
+                       */
+                       cp->wq.len = 0;
                        if (cp->proxy->wq.len == 0) {
                            shutdown(cp->proxy->fd, SHUTDOWN_WRITE);
                            cp->proxy->bp = 1;
-------------- next part --------------

%% illustrates ssl loop bug

%% do  S = s2:a() in an erlang shell
%% in a terminal shell, do 
%%     openssl s_client -connect localhost:5432
%%     Ctrl-Z
%% do s2:b(S)
%% then kill %1 in the shell, i.e. kill openssl s_client w/o reading.
%% at this point, esock loops wildly.

%% You may want to run this with 'export ERL_SSL_DEBUG=true' and then
%% you should see something like:
%%    adding all to write queue 32768 bytes
%% after doing b(S).  If not, increase the number of bytes written.

a() ->
    {ok, L} =
                   [{active, false},
                    {certfile, "cert.example"},
                    {keyfile, "key.example"}]),
    {ok, S} = ssl:accept(L),

b(S) ->
    ssl:send(S, lists:duplicate(809600, $a)),

More information about the erlang-questions mailing list