[erlang-questions] Errors in SSL handshake (weird client)
Ingela Andin
ingela.andin@REDACTED
Thu Apr 17 11:19:13 CEST 2014
Hi Danil!
2014-04-17 8:52 GMT+02:00 Danil Zagoskin <z@REDACTED>:
> Hi, Ingela!
>
> The patch works like a charm, thank you! It completely fixes the initial
> problem with buggy client.
>
>
One bug down!
>
> But there are some sad news, too:
>
>
Well reproducible is always good, means we can haunt it down and fix it :)
> While playing with my dummy server (
> https://github.com/stolen/ssldump/blob/master/src/ssldump.erl#L70) I
> accidentally reproduced badarg in erlang:size once more:
>
> =ERROR REPORT==== 17-Apr-2014::07:34:58 ===
> ** State machine <0.58.0> terminating
> ** Last message in was {tcp,#Port<0.1498>,
>
> [22,3,1,0,173,1,0,0,169,3,3,83,79,75,226,78,81,8,
> 77,79,97,154,127,165,92,183,20,239,70,183,158,
>
> 193,222,139,77,116,13,128,12,66,120,63,110,0,0,
>
> 94,0,255,192,36,192,35,192,10,192,9,192,7,192,8,
>
> 192,40,192,39,192,20,192,19,192,17,192,18,192,38,
>
> 192,37,192,42,192,41,192,5,192,4,192,2,192,3,192,
>
> 15,192,14,192,12,192,13,0,61,0,60,0,47,0,5,0,4,0,
>
> 53,0,10,0,103,0,107,0,51,0,57,0,22,0,175,0,174,0,
>
> 141,0,140,0,138,0,139,0,177,0,176,0,44,0,59,1,0,
>
> 0,34,0,10,0,8,0,6,0,23,0,24,0,25,0,11,0,2,1,0,0,
> 13,0,12,0,10,5,1,4,1,2,1,4,3,2,3]}
> (meaningless state omitted)
> ** Reason for termination =
> ** {badarg,[{erlang,size,
>
> [[22,3,1,0,173,1,0,0,169,3,3,83,79,75,226,78,81,8,77,79,
>
> 97,154,127,165,92,183,20,239,70,183,158,193,222,139,77,
>
> 116,13,128,12,66,120,63,110,0,0,94,0,255,192,36,192,35,
> 192,10,192,9,192,7,192,8,192,40,192,39,192,20,192,19,
>
> 192,17,192,18,192,38,192,37,192,42,192,41,192,5,192,4,
>
> 192,2,192,3,192,15,192,14,192,12,192,13,0,61,0,60,0,47,
> 0,5,0,4,0,53,0,10,0,103,0,107,0,51,0,57,0,22,0,175,0,
>
> 174,0,141,0,140,0,138,0,139,0,177,0,176,0,44,0,59,1,0,0,
>
> 34,0,10,0,8,0,6,0,23,0,24,0,25,0,11,0,2,1,0,0,13,0,12,0,
> 10,5,1,4,1,2,1,4,3,2,3]],
> []},
> {tls_record,get_tls_records_aux,2,
> [{file,"tls_record.erl"},{line,122}]},
> {tls_connection,next_tls_record,2,
> [{file,"tls_connection.erl"},{line,484}]},
> {tls_connection,handle_info,3,
> [{file,"tls_connection.erl"},{line,307}]},
> {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,503}]},
>
> {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>
> The client used was simple curl under mac os, and this crash did not
> reproduce.
> As you see, the packet received as list instead of binary is client_hello
> which is the first message ssl process gets.
> Maybe there is some race condition when changing mode on just
> transport_accepted socket.
>
>
>
> Also, the same server when working with the same curl may stall at this
> line:
> {ok, _} = ssl:recv(Socket, 0, 10000),
> process_info showed that the worker has {ssl,#sslsocket{}, Data} in its
> mailbox.
>
> I've tuned the server to reproduce this stall easier:
> https://github.com/stolen/ssldump/blob/b3eebc2fa85ca0bd8ed8470c400b972aa737bffc/src/ssldump.erl(master may change)
>
> The client is manually invoked curl which is manually interrupted after
> sufficiently large time of inactivity. The console log looks like this:
>
> stolen@REDACTED:~/ssldump$ ./dummyserver.sh 9999 ../cert/pipe.trololo
> Erlang/OTP 17 [erts-6.0] [source-07b8f44] [smp:2:2] [async-threads:10]
> [hipe] [kernel-poll:false]
>
> Eshell V6.0 (abort with ^G)
> 1> Accepting SSL in <0.46.0>
> Socket state: connection, options: {socket_options,list,0,0,0,true}
>
> =ERROR REPORT==== 17-Apr-2014::09:08:01 ===
> Error in process <0.46.0> with exit value:
> {{badmatch,{error,timeout}},[{ssldump,ssl_acceptor,1,[{file,"src/ssldump.erl"},{line,98}]}]}
>
> Accepting SSL in <0.47.0>
> Socket state: connection, options: {socket_options,binary,0,0,0,false}
> Accepting SSL in <0.48.0>
> Socket state: connection, options: {socket_options,binary,0,0,0,false}
> Accepting SSL in <0.51.0>
> Socket state: connection, options: {socket_options,list,0,0,0,true}
>
> =ERROR REPORT==== 17-Apr-2014::09:09:17 ===
> Error in process <0.51.0> with exit value:
> {{badmatch,{error,closed}},[{ssldump,ssl_acceptor,1,[{file,"src/ssldump.erl"},{line,98}]}]}
>
> Accepting SSL in <0.53.0>
> Socket state: connection, options: {socket_options,binary,0,0,0,false}
> Accepting SSL in <0.55.0>
> Socket state: connection, options: {socket_options,binary,0,0,0,false}
> Accepting SSL in <0.57.0>
> Socket state: connection, options: {socket_options,list,0,0,0,true}
>
> =ERROR REPORT==== 17-Apr-2014::09:11:26 ===
> Error in process <0.57.0> with exit value:
> {{badmatch,{error,closed}},[{ssldump,ssl_acceptor,1,[{file,"src/ssldump.erl"},{line,98}]}]}
>
>
> My interpretation:
> - Given pool size X, the first of each X requests fails.
> - First of failing requests catches timeout as expected, other ones just
> stall.
> - Failing requests have emulated options [{mode, list}, {active, true}]
> while working ones have the opposite [{mode, binary}, {active, false}].
>
> After couple of further experiments I discovered that only every (X*N +
> 1)th acceptor inherits listening socket options, other ones always have
> [{mode, binary}, {active, false}].
>
>
>
Humm... I am not sure if this is what happens in your case but the
following should fix a potential inheritance problem
diff --git a/lib/ssl/src/ssl.erl b/lib/ssl/src/ssl.erl
index 743753b..7768bbb 100644
--- a/lib/ssl/src/ssl.erl
+++ b/lib/ssl/src/ssl.erl
@@ -187,6 +187,7 @@ transport_accept(#sslsocket{pid = {ListenSocket,
{error, Reason}
end;
{error, Reason} ->
+ ok = ssl_socket:setopts(Transport, ListenSocket, SocketValues),
{error, Reason}
end.
> And, apart from eventual option inheritance, shouldn't ssl:recv on active
> socket return {error, einval} like gen_tcp:recv does?
>
>
>
I think it should.
Regards Ingela Erlang/OTP Team Ericssson AB
> 2014-04-16 19:36 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:
>
> Hi!
>>
>> 2014-04-16 15:56 GMT+02:00 Danil Zagoskin <z@REDACTED>:
>>
>> Thank you for the patch. I'll check it.
>>>
>>
>>
>> New patch, better tested ;), on branch
>> ia/ssl/user-suites-match-negotiated-version at
>> https://github.com/IngelaAndin/otp
>>
>>
>>>
>>> (Style question: is lists:filtermap better than list comprehension?)
>>>
>>
>>
>> I think just a matter of taste!
>>
>>
>>> > #ssl_options{] is an internal representation of the proplists input
>>> list and should not be a visible in the API, records are generally not a
>>> good API choice as they enforce compile time dependencies.
>>>
>>> Yes, I understand that. I wanted to start broken client to see how
>>> server would work with it.
>>> The option to do that could be using sys:replace_state, but as far as I
>>> understand the code, client_hello is sent immediately after starting a
>>> process, so it seems there is no way to send crafted hello and continue
>>> handshake with couple of lines instead of ssl:connect.
>>>
>>> Given that possibility it could be much easier to reproduce server
>>> crashes when working with buggy clients.
>>> For example, How do I write a test for badmatch in tls_record:decode_cipher_text?
>>> The function takes #connection_states{}, and creating that record
>>> relies on #ssl_options{} which cannot be created without opening a
>>> socket and calling gen_fsm:enter_loop.
>>>
>>>
>> Yes you have a good point. I do not have a good short answer. Testing
>> error cases especially when the protocol sometimes
>> tries to hide the real error reason due to security issues and
>> shortcutting the API and creating raw protocol command is not trivial.
>>
>> Regards Ingela Erlang/OTP team - Ericsson AB
>>
>>
>>
>>
>>
>>>
>>> 2014-04-16 17:07 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:
>>>
>>> Hi!
>>>>
>>>> 2014-04-16 0:07 GMT+02:00 Danil Zagoskin <z@REDACTED>:
>>>>
>>>> Hi, Ingela.
>>>>>
>>>>> I finally had some time for investigating this problem.
>>>>>
>>>>> I didn't see badarg in erlang:size again, but it definitely appeared
>>>>> in 17.0 with simple accept method described as minimal example at
>>>>> http://www.erlang.org/doc/apps/ssl/using_ssl.html (
>>>>>
>>>>
>>>>
>>>>
>>>>> the difference was transport_accept and ssl_accept were run in
>>>>> separate process)
>>>>>
>>>>
>>>> That should be ok, you are suppose to be able to do that.
>>>>
>>>>
>>>>
>>>>> Today I met some new crash:
>>>>> ** {{badmatch,{alert,2,20,{"ssl_cipher.erl",215}}},
>>>>>
>>>>> [{tls_record,decode_cipher_text,2,[{file,"tls_record.erl"},{line,157}]},
>>>>>
>>>>> {tls_connection,next_record,1,[{file,"tls_connection.erl"},{line,503}]},
>>>>>
>>>>> {tls_connection,next_state,4,[{file,"tls_connection.erl"},{line,475}]},
>>>>> {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,503}]},
>>>>> {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>>>>
>>>>> That's because ssl_record:decipher/3 may return #alert{}, but
>>>>> tls_record:decode_cipher_text/2 isn't expecting that.
>>>>>
>>>>>
>>>> Hum, looks like a bug. We will look into that.
>>>>
>>>>
>>>>>
>>>>> As for {case_clause,{4}} in ssl_v3:mac_hash/3 I found that
>>>>> precondition for crash is easily reproducible:
>>>>> https://gist.github.com/stolen/10780653
>>>>> Ordinary servers negotiate with sha hash or close socket while ssl in
>>>>> 17.0 negotiates with sha256 hash.
>>>>> I don't know the protocol good enough to perform further states, but I
>>>>> hope this helps.
>>>>>
>>>>>
>>>> Okej, the following patch should take care of that problem.
>>>>
>>>> diff --git a/lib/ssl/src/ssl.erl b/lib/ssl/src/ssl.erl
>>>> index 743753b..093d4af 100644
>>>> --- a/lib/ssl/src/ssl.erl
>>>> +++ b/lib/ssl/src/ssl.erl
>>>> @@ -953,8 +953,8 @@ handle_cipher_option(Value, Version) when
>>>> is_list(Value) ->
>>>> error:_->
>>>> throw({error, {options, {ciphers, Value}}})
>>>> end.
>>>> -binary_cipher_suites(Version, []) -> %% Defaults to all supported suits
>>>> - ssl_cipher:suites(Version);
>>>> +binary_cipher_suites(Version, []) ->
>>>> + [];
>>>> binary_cipher_suites(Version, [{_,_,_,_}| _] = Ciphers0) -> %%
>>>> Backwards compatibility
>>>> Ciphers = [{KeyExchange, Cipher, Hash} || {KeyExchange, Cipher,
>>>> Hash, _} <- Ciphers0],
>>>> binary_cipher_suites(Version, Ciphers);
>>>> diff --git a/lib/ssl/src/ssl_handshake.erl
>>>> b/lib/ssl/src/ssl_handshake.erl
>>>> index 1108edc..93d4efd 100644
>>>> --- a/lib/ssl/src/ssl_handshake.erl
>>>> +++ b/lib/ssl/src/ssl_handshake.erl
>>>> @@ -1017,11 +1017,14 @@ decode_suites('3_bytes', Dec) ->
>>>> %%-------------Cipeher suite handling --------------------------------
>>>>
>>>> available_suites(UserSuites, Version) ->
>>>> + VersionSupportedSuites = ssl_cipher:suites(Version),
>>>> case UserSuites of
>>>> [] ->
>>>> - ssl_cipher:suites(Version);
>>>> + VersionSupportedSuites;
>>>> _ ->
>>>> - UserSuites
>>>> + lists:filtermap(fun(Suite) ->
>>>> + lists:member(Suite, VersionSupportedSuites)
>>>> + end, UserSuites)
>>>> end.
>>>>
>>>> available_suites(ServerCert, UserSuites, Version, Curve) ->
>>>>
>>>>
>>>>
>>>>
>>>>> Unfortunately, I don't understand how this code should be tested —
>>>>> most of functions require #state{} or #connection_states{} which require
>>>>> #ssl_options{} and other stuff, and at I have found no exported function
>>>>> creating #ssl_options{} without side-effects.
>>>>>
>>>>
>>>> #ssl_options{] is an internal representation of the proplists input
>>>> list and should not be a visible in the API, records are generally not a
>>>> good API choice as they enforce compile time dependencies.
>>>>
>>>>
>>>> Regards Ingela - Erlang/OTP team - Ericsson AB
>>>>
>>>>
>>>>
>>>>>
>>>>> 2014-04-15 16:52 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:
>>>>>
>>>>> Hello again!
>>>>>>
>>>>>>
>>>>>>> sometimes badarg in
>>>>>>> erlang:size([22,3,1,0,158,1,0,0,154,3,1,83,74|...]) at tls_record.erl:122.
>>>>>>>
>>>>>>
>>>>>> Same symptom was recently reported on erlang-bugs and it turned out
>>>>>> to be due to upgrading a gen_tcp socket in active mode. When upgrading a
>>>>>> gen_tcp socket to an ssl socket it must be put in passive mode ({active,
>>>>>> false}) before the client
>>>>>> is allowed to start the handshake. Normaly if the upgrade is
>>>>>> negotiated this is not a problem for the server to set the option
>>>>>> before signaling to the client to go ahead with the handshake. If
>>>>>> the upgrade is only performed on the server side instead of calling the ssl
>>>>>> API ( some people may do that due to the previously lack of possibility to
>>>>>> specify ssl options when calling ssl:ssl_accept with an "sslsocket", this
>>>>>> is no longer the case in 17.0) the listen socket needs to be put in passive
>>>>>> mode (listen options are inherited by the acceptsocket) to make sure that
>>>>>> the it will work, otherwhise it will work sometimes and sometime result in
>>>>>> the error above.
>>>>>>
>>>>>> Regards Ingela Erlang/OTP team - Ericsson AB
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Danil Zagoskin | z@REDACTED
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Danil Zagoskin | z@REDACTED
>>>
>>
>>
>
>
> --
> Danil Zagoskin | z@REDACTED
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140417/9ff9b04c/attachment.htm>
More information about the erlang-questions
mailing list