[erlang-questions] [BUG] hidden, totally undocumented bug :) in ssl module -- ssl:accept/1

Gaspar Chilingarov nm@REDACTED
Wed Jan 30 02:43:29 CET 2008


Hi there !

Some time ago I've reported on the list some problems with
ssl module. This was related to accepting incoming ssl connections
and handling them.

We have software, which runs on FreeBSD 6.x OS and which deals with ssl
connections -- accepting connections from clients, checking client
certificates and allowing or denying access depending on certificate
check. We had seen a large number of connections stuck in CLOSE_WAIT
state. After some investigation it turned out that client initiates tcp
connection, then sends some part of it's certificate to the server (70
bytes payload) and we see connection in ESTABLISHED state


Proto Recv-Q Send-Q  Local Address     Foreign Address        (state)
tcp4   70      0     1.1.1.1.443       2.2.2.1.12345      ESTABLISHED

We have some data waiting to be received on the socket. At that moment
applications is blocked in ssl:accept(Socket) call. Then we will see
more data coming from client to the socket, say

Proto Recv-Q Send-Q  Local Address     Foreign Address        (state)
tcp4     102    0     1.1.1.1.443       2.2.2.1.12345      ESTABLISHED

But without any luck -- connection does not accepted and no client
socket are created for it.

After some timeout client disconnects and we get CLOSED_WAIT socket state.

Proto Recv-Q Send-Q  Local Address     Foreign Address        (state)
tcp4     102    0     1.1.1.1.443       2.2.2.1.12345      CLOSE_WAIT


In this state it can hand around forever, until the erlang emulator will 
be restarted. One of the causes is that connecting clients sometimes 
have huge rtt time and huge packet loss, which makes impossible to data 
over tcp.

In same time I've observed interesting fact, that yaws server embedded 
in same application and running with exactly same config (certificates, 
validation level, timeouts) worked without any problem and we had never 
seen any stuck connections on its SSL port.

After tracing ALL calls to ssl module I've found that the single 
difference between us and yaws was the connection accepting logic.


In our part of application we have called ssl:accept(Socket) and waited 
forever for incoming ssl connection as opposed to yaws, which called
ssl:accept(Socket, 10000). If there was no incoming connection in 10 
seconds, ssl:accept will timeout and then yaws just calls ssl:accept/2 
again.

After adding some dirty hack to your listen/accept scheme the issue was 
solved completely. Now it accepts or drops connections from remote 
clients nicely. All stuck connections go from ESTABLISHED to FIN_WAIT_1 
state as expected and are closed by OS after some timeout.


Conclusion:
There is a housekeeping work, which does ssl module inside ssl:accept 
call before actually accepting connections (especially related to 
cleaning up old tcp connections, which are dropped from client side).
If you use just ssl:accept with infinity timeout this cleaning up never 
occurs, thus filling connection table with CLOSE_WAIT sockets.

This bug exists at least from 10b9 release and  it's not ever documented 
or somehow fixed all that time. (It exists in 12r0 release, which I've 
tested a few days before).


We had a long time fighting this issue and finally found the source of 
that bug :)

So I'm kindly begging erlang team to put a few words into the 
documentation or fix it -- which is easier to do :)


With best regards,
Gaspar



-- 
Gaspar Chilingarov

System Administrator,
Network security consulting

t +37493 419763 (mob)
i 63174784
e nm@REDACTED
w http://gasparchilingarov.com/





More information about the erlang-questions mailing list