[erlang-questions] [BUG] hidden, totally undocumented bug :) in ssl module -- ssl:accept/1
Gaspar Chilingarov
nm@REDACTED
Wed Jan 30 02:43:29 CET 2008
Hi there !
Some time ago I've reported on the list some problems with
ssl module. This was related to accepting incoming ssl connections
and handling them.
We have software, which runs on FreeBSD 6.x OS and which deals with ssl
connections -- accepting connections from clients, checking client
certificates and allowing or denying access depending on certificate
check. We had seen a large number of connections stuck in CLOSE_WAIT
state. After some investigation it turned out that client initiates tcp
connection, then sends some part of it's certificate to the server (70
bytes payload) and we see connection in ESTABLISHED state
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 70 0 1.1.1.1.443 2.2.2.1.12345 ESTABLISHED
We have some data waiting to be received on the socket. At that moment
applications is blocked in ssl:accept(Socket) call. Then we will see
more data coming from client to the socket, say
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 102 0 1.1.1.1.443 2.2.2.1.12345 ESTABLISHED
But without any luck -- connection does not accepted and no client
socket are created for it.
After some timeout client disconnects and we get CLOSED_WAIT socket state.
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 102 0 1.1.1.1.443 2.2.2.1.12345 CLOSE_WAIT
In this state it can hand around forever, until the erlang emulator will
be restarted. One of the causes is that connecting clients sometimes
have huge rtt time and huge packet loss, which makes impossible to data
over tcp.
In same time I've observed interesting fact, that yaws server embedded
in same application and running with exactly same config (certificates,
validation level, timeouts) worked without any problem and we had never
seen any stuck connections on its SSL port.
After tracing ALL calls to ssl module I've found that the single
difference between us and yaws was the connection accepting logic.
In our part of application we have called ssl:accept(Socket) and waited
forever for incoming ssl connection as opposed to yaws, which called
ssl:accept(Socket, 10000). If there was no incoming connection in 10
seconds, ssl:accept will timeout and then yaws just calls ssl:accept/2
again.
After adding some dirty hack to your listen/accept scheme the issue was
solved completely. Now it accepts or drops connections from remote
clients nicely. All stuck connections go from ESTABLISHED to FIN_WAIT_1
state as expected and are closed by OS after some timeout.
Conclusion:
There is a housekeeping work, which does ssl module inside ssl:accept
call before actually accepting connections (especially related to
cleaning up old tcp connections, which are dropped from client side).
If you use just ssl:accept with infinity timeout this cleaning up never
occurs, thus filling connection table with CLOSE_WAIT sockets.
This bug exists at least from 10b9 release and it's not ever documented
or somehow fixed all that time. (It exists in 12r0 release, which I've
tested a few days before).
We had a long time fighting this issue and finally found the source of
that bug :)
So I'm kindly begging erlang team to put a few words into the
documentation or fix it -- which is easier to do :)
With best regards,
Gaspar
--
Gaspar Chilingarov
System Administrator,
Network security consulting
t +37493 419763 (mob)
i 63174784
e nm@REDACTED
w http://gasparchilingarov.com/
More information about the erlang-questions
mailing list