[erlang-bugs] R18 Unbounded SSL Session ETS Table Growth

Ingela Anderton Andin Ingela.Anderton.Andin@REDACTED
Thu Sep 3 16:00:49 CEST 2015


Hi!

See in line comments below:

On 09/02/2015 04:15 PM, Ben Murphy wrote:
> I've seen in production that the ssl_session_cache ETS table can
> become very large which will start to cause new SSL connections to
> take > 5 seconds to establish. The root cause of this is that multiple
> SSL sessions are stored for a particular SSL connection configuration
> even though only 1 (the most recent) is needed.
>
> So the ETS table is keyed by {Host, Port, SessionID} but there a bunch
> of other parameters that need to match for a session to be resumed for
> example the client certificate and compression algorithm also need to
> match. So what the current code does is create a new entry into the
> table for each connection (even if session reuse is not enabled!!) and
> then when you create a new connection it will iterate through all the
> matching sessions for that {Host,Port} and check that the other
> parameters match.
>
> Looking at the code sessions are only removed from this table when a
> lifetime is reached which is configurable but defaults to 24 hours or
> if a FATAL error happens on a connection with that ID.
>
> In the pathological case where a server supplies a session ID but
> never supports resuming it this causes the session table to grow at
> the same rate as new connections are established. This makes
> establishing N connections take O(N^2) work. Also, in the case when
> {reuse_sessions, false} has been supplied the session should not be
> added to the table because a new entry will be added to the table
> every time and will only be removed after 24 hours.
>
> We've witnessed the catastrophic slow down occur when making a
> requests against a server that normally resumes sessions properly. I
> suspect this is because a) it started failing to resume sessions
> because of some failure on their side or b) it's session lifetime was
> considerably less than 24 hours and erlang started to try to resume
> failed sessions and continuously created new sessions because of this.
> I think it is also important to note that while the
> register_unique_session fix would fix the memory leak if it worked in
> this situation it would cause a new session to be created each time
> and make ssl session caching pointless until the erlang session
> expired. I think it would be preferable to create a new session and
> delete the old one to preserve the uniqueness but I'm not sure how
> this could be done ETS without creating a race that would generate
> multiple sessions. The other alternative would be to delete sessions
> that are known to not resume. For example if you try to resume a
> session and the server no longer knows about it this is known by the
> client because the client has to go through the whole handshake.
>
> I think this was meant to be fixed in by register_unique_session
> fucntion but the fix does not work because it assumes the return value
> of select_session is [#session{}] when it is really [ [binary(),
> #session{}] ].

Thank you for the detailed explanations and suggestions.
Well yes there is definitely a bug here. I am analyzing the best way to
fix it now.
I think in the short run we will fix so that register_unique_session
works as intended. And
we will  analyze if there are further improvements that can be
done to keep the session table "fresh" and small.

> https://github.com/erlang/otp/blob/maint/lib/ssl/src/ssl_manager.erl#L564


Regards Ingela Erlang/OTP team - Ericsson AB








More information about the erlang-bugs mailing list