[erlang-bugs] R18 Unbounded SSL Session ETS Table Growth

Ben Murphy benmmurphy@REDACTED
Wed Sep 2 16:15:33 CEST 2015

I've seen in production that the ssl_session_cache ETS table can
become very large which will start to cause new SSL connections to
take > 5 seconds to establish. The root cause of this is that multiple
SSL sessions are stored for a particular SSL connection configuration
even though only 1 (the most recent) is needed.

So the ETS table is keyed by {Host, Port, SessionID} but there a bunch
of other parameters that need to match for a session to be resumed for
example the client certificate and compression algorithm also need to
match. So what the current code does is create a new entry into the
table for each connection (even if session reuse is not enabled!!) and
then when you create a new connection it will iterate through all the
matching sessions for that {Host,Port} and check that the other
parameters match.

Looking at the code sessions are only removed from this table when a
lifetime is reached which is configurable but defaults to 24 hours or
if a FATAL error happens on a connection with that ID.

In the pathological case where a server supplies a session ID but
never supports resuming it this causes the session table to grow at
the same rate as new connections are established. This makes
establishing N connections take O(N^2) work. Also, in the case when
{reuse_sessions, false} has been supplied the session should not be
added to the table because a new entry will be added to the table
every time and will only be removed after 24 hours.

We've witnessed the catastrophic slow down occur when making a
requests against a server that normally resumes sessions properly. I
suspect this is because a) it started failing to resume sessions
because of some failure on their side or b) it's session lifetime was
considerably less than 24 hours and erlang started to try to resume
failed sessions and continuously created new sessions because of this.
I think it is also important to note that while the
register_unique_session fix would fix the memory leak if it worked in
this situation it would cause a new session to be created each time
and make ssl session caching pointless until the erlang session
expired. I think it would be preferable to create a new session and
delete the old one to preserve the uniqueness but I'm not sure how
this could be done ETS without creating a race that would generate
multiple sessions. The other alternative would be to delete sessions
that are known to not resume. For example if you try to resume a
session and the server no longer knows about it this is known by the
client because the client has to go through the whole handshake.

I think this was meant to be fixed in by register_unique_session
fucntion but the fix does not work because it assumes the return value
of select_session is [#session{}] when it is really [ [binary(),
#session{}] ].


This is an example of the broken behaviour with reuse_sessions: false
(should work on R16B02 and R18).

1>  application:ensure_all_started(ssl).
2> ets:info(element(2, sys:get_state(whereis(ssl_manager)))).
3> ssl:close(element(2, ssl:connect("google.com", 443,
[{reuse_sessions, false}]))).
4> ssl:close(element(2, ssl:connect("google.com", 443,
[{reuse_sessions, false}]))).
5> ssl:close(element(2, ssl:connect("google.com", 443,
[{reuse_sessions, false}]))).
6> ssl:close(element(2, ssl:connect("google.com", 443,
[{reuse_sessions, false}]))).
7> ssl:close(element(2, ssl:connect("google.com", 443,
[{reuse_sessions, false}]))).
8> ssl:close(element(2, ssl:connect("google.com", 443,
[{reuse_sessions, false}]))).
9> ssl:close(element(2, ssl:connect("google.com", 443,
[{reuse_sessions, false}]))).
10> ssl:close(element(2, ssl:connect("google.com", 443,
[{reuse_sessions, false}]))).
11> ssl:close(element(2, ssl:connect("google.com", 443,
[{reuse_sessions, false}]))).
12> ets:info(element(2, sys:get_state(whereis(ssl_manager)))).

More information about the erlang-bugs mailing list