[erlang-questions] Errors in SSL handshake (weird client)

Danil Zagoskin z@REDACTED
Thu Apr 17 08:52:29 CEST 2014


Hi, Ingela!

The patch works like a charm, thank you! It completely fixes the initial
problem with buggy client.


But there are some sad news, too:

While playing with my dummy server (
https://github.com/stolen/ssldump/blob/master/src/ssldump.erl#L70) I
accidentally reproduced badarg in erlang:size once more:

=ERROR REPORT==== 17-Apr-2014::07:34:58 ===
** State machine <0.58.0> terminating
** Last message in was {tcp,#Port<0.1498>,

[22,3,1,0,173,1,0,0,169,3,3,83,79,75,226,78,81,8,
                             77,79,97,154,127,165,92,183,20,239,70,183,158,
                             193,222,139,77,116,13,128,12,66,120,63,110,0,0,

 94,0,255,192,36,192,35,192,10,192,9,192,7,192,8,

 192,40,192,39,192,20,192,19,192,17,192,18,192,38,

 192,37,192,42,192,41,192,5,192,4,192,2,192,3,192,

 15,192,14,192,12,192,13,0,61,0,60,0,47,0,5,0,4,0,

 53,0,10,0,103,0,107,0,51,0,57,0,22,0,175,0,174,0,

 141,0,140,0,138,0,139,0,177,0,176,0,44,0,59,1,0,

 0,34,0,10,0,8,0,6,0,23,0,24,0,25,0,11,0,2,1,0,0,
                             13,0,12,0,10,5,1,4,1,2,1,4,3,2,3]}
(meaningless state omitted)
** Reason for termination =
** {badarg,[{erlang,size,
                    [[22,3,1,0,173,1,0,0,169,3,3,83,79,75,226,78,81,8,77,79,

97,154,127,165,92,183,20,239,70,183,158,193,222,139,77,

116,13,128,12,66,120,63,110,0,0,94,0,255,192,36,192,35,
                      192,10,192,9,192,7,192,8,192,40,192,39,192,20,192,19,
                      192,17,192,18,192,38,192,37,192,42,192,41,192,5,192,4,

192,2,192,3,192,15,192,14,192,12,192,13,0,61,0,60,0,47,
                      0,5,0,4,0,53,0,10,0,103,0,107,0,51,0,57,0,22,0,175,0,

174,0,141,0,140,0,138,0,139,0,177,0,176,0,44,0,59,1,0,0,

34,0,10,0,8,0,6,0,23,0,24,0,25,0,11,0,2,1,0,0,13,0,12,0,
                      10,5,1,4,1,2,1,4,3,2,3]],
                    []},
            {tls_record,get_tls_records_aux,2,
                        [{file,"tls_record.erl"},{line,122}]},
            {tls_connection,next_tls_record,2,
                            [{file,"tls_connection.erl"},{line,484}]},
            {tls_connection,handle_info,3,
                            [{file,"tls_connection.erl"},{line,307}]},
            {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,503}]},

{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}

The client used was simple curl under mac os, and this crash did not
reproduce.
As you see, the packet received as list instead of binary is client_hello
which is the first message ssl process gets.
Maybe there is some race condition when changing mode on just
transport_accepted socket.



Also, the same server when working with the same curl may stall at this
line:
  {ok, _} = ssl:recv(Socket, 0, 10000),
process_info showed that the worker has {ssl,#sslsocket{}, Data} in its
mailbox.

I've tuned the server to reproduce this stall easier:
https://github.com/stolen/ssldump/blob/b3eebc2fa85ca0bd8ed8470c400b972aa737bffc/src/ssldump.erl(master
may change)

The client is manually invoked curl which is manually interrupted after
sufficiently large time of inactivity. The console log looks like this:

stolen@REDACTED:~/ssldump$ ./dummyserver.sh 9999 ../cert/pipe.trololo
Erlang/OTP 17 [erts-6.0] [source-07b8f44] [smp:2:2] [async-threads:10]
[hipe] [kernel-poll:false]

Eshell V6.0  (abort with ^G)
1> Accepting SSL in <0.46.0>
Socket state: connection, options: {socket_options,list,0,0,0,true}

=ERROR REPORT==== 17-Apr-2014::09:08:01 ===
Error in process <0.46.0> with exit value:
{{badmatch,{error,timeout}},[{ssldump,ssl_acceptor,1,[{file,"src/ssldump.erl"},{line,98}]}]}

Accepting SSL in <0.47.0>
Socket state: connection, options: {socket_options,binary,0,0,0,false}
Accepting SSL in <0.48.0>
Socket state: connection, options: {socket_options,binary,0,0,0,false}
Accepting SSL in <0.51.0>
Socket state: connection, options: {socket_options,list,0,0,0,true}

=ERROR REPORT==== 17-Apr-2014::09:09:17 ===
Error in process <0.51.0> with exit value:
{{badmatch,{error,closed}},[{ssldump,ssl_acceptor,1,[{file,"src/ssldump.erl"},{line,98}]}]}

Accepting SSL in <0.53.0>
Socket state: connection, options: {socket_options,binary,0,0,0,false}
Accepting SSL in <0.55.0>
Socket state: connection, options: {socket_options,binary,0,0,0,false}
Accepting SSL in <0.57.0>
Socket state: connection, options: {socket_options,list,0,0,0,true}

=ERROR REPORT==== 17-Apr-2014::09:11:26 ===
Error in process <0.57.0> with exit value:
{{badmatch,{error,closed}},[{ssldump,ssl_acceptor,1,[{file,"src/ssldump.erl"},{line,98}]}]}


My interpretation:
  - Given pool size X, the first of each X requests fails.
  - First of failing requests catches timeout as expected, other ones just
stall.
  - Failing requests have emulated options [{mode, list}, {active, true}]
while working ones have the opposite [{mode, binary}, {active, false}].

After couple of further experiments I discovered that only every (X*N +
1)th acceptor inherits listening socket options, other ones always have
[{mode, binary}, {active, false}].


And, apart from eventual option inheritance, shouldn't ssl:recv on active
socket return {error, einval} like gen_tcp:recv does?


2014-04-16 19:36 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:

> Hi!
>
> 2014-04-16 15:56 GMT+02:00 Danil Zagoskin <z@REDACTED>:
>
> Thank you for the patch. I'll check it.
>>
>
>
> New patch, better tested ;),  on branch
> ia/ssl/user-suites-match-negotiated-version at
> https://github.com/IngelaAndin/otp
>
>
>>
>> (Style question: is lists:filtermap better than list comprehension?)
>>
>
>
> I think just a matter of taste!
>
>
>> > #ssl_options{] is an internal representation of the proplists input
>> list and should not be a visible in the API, records are generally not a
>> good API choice as they enforce compile time dependencies.
>>
>> Yes, I understand that. I wanted to start broken client to see how server
>> would work with it.
>> The option to do that could be using sys:replace_state, but as far as I
>> understand the code, client_hello is sent immediately after starting a
>> process, so it seems there is no way to send crafted hello and continue
>> handshake with couple of lines instead of ssl:connect.
>>
>> Given that possibility it could be much easier to reproduce server
>> crashes when working with buggy clients.
>> For example, How do I write a test for badmatch in tls_record:decode_cipher_text?
>> The function takes #connection_states{}, and creating that record relies
>> on #ssl_options{} which cannot be created without opening a socket and
>> calling gen_fsm:enter_loop.
>>
>>
> Yes  you have a good point.  I do not have a good short answer. Testing
> error cases especially when the protocol sometimes
> tries to hide the real error reason due to security issues and
> shortcutting the API and creating raw protocol command is not trivial.
>
> Regards Ingela Erlang/OTP team - Ericsson AB
>
>
>
>
>
>>
>> 2014-04-16 17:07 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:
>>
>> Hi!
>>>
>>> 2014-04-16 0:07 GMT+02:00 Danil Zagoskin <z@REDACTED>:
>>>
>>> Hi, Ingela.
>>>>
>>>> I finally had some time for investigating this problem.
>>>>
>>>> I didn't see badarg in erlang:size again, but it definitely appeared in
>>>> 17.0 with simple accept method described as minimal example at
>>>> http://www.erlang.org/doc/apps/ssl/using_ssl.html (
>>>>
>>>
>>>
>>>
>>>> the difference was transport_accept and ssl_accept were run in separate
>>>> process)
>>>>
>>>
>>> That  should be ok, you are suppose to be able to do that.
>>>
>>>
>>>
>>>> Today I met some new crash:
>>>> ** {{badmatch,{alert,2,20,{"ssl_cipher.erl",215}}},
>>>>
>>>> [{tls_record,decode_cipher_text,2,[{file,"tls_record.erl"},{line,157}]},
>>>>
>>>>  {tls_connection,next_record,1,[{file,"tls_connection.erl"},{line,503}]},
>>>>
>>>>  {tls_connection,next_state,4,[{file,"tls_connection.erl"},{line,475}]},
>>>>      {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,503}]},
>>>>      {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>>>
>>>> That's because ssl_record:decipher/3 may return #alert{}, but
>>>> tls_record:decode_cipher_text/2 isn't expecting that.
>>>>
>>>>
>>> Hum, looks like a bug. We will look into that.
>>>
>>>
>>>>
>>>> As for {case_clause,{4}} in ssl_v3:mac_hash/3 I found that precondition
>>>> for crash is easily reproducible:
>>>> https://gist.github.com/stolen/10780653
>>>> Ordinary servers negotiate with sha hash or close socket while ssl in
>>>> 17.0 negotiates with sha256 hash.
>>>> I don't know the protocol good enough to perform further states, but I
>>>> hope this helps.
>>>>
>>>>
>>> Okej, the following patch should take care of that problem.
>>>
>>> diff --git a/lib/ssl/src/ssl.erl b/lib/ssl/src/ssl.erl
>>> index 743753b..093d4af 100644
>>> --- a/lib/ssl/src/ssl.erl
>>> +++ b/lib/ssl/src/ssl.erl
>>> @@ -953,8 +953,8 @@ handle_cipher_option(Value, Version)  when
>>> is_list(Value) ->
>>>      error:_->
>>>          throw({error, {options, {ciphers, Value}}})
>>>      end.
>>> -binary_cipher_suites(Version, []) -> %% Defaults to all supported suits
>>> -    ssl_cipher:suites(Version);
>>> +binary_cipher_suites(Version, []) ->
>>> +    [];
>>>  binary_cipher_suites(Version, [{_,_,_,_}| _] = Ciphers0) -> %%
>>> Backwards compatibility
>>>      Ciphers = [{KeyExchange, Cipher, Hash} || {KeyExchange, Cipher,
>>> Hash, _} <- Ciphers0],
>>>      binary_cipher_suites(Version, Ciphers);
>>> diff --git a/lib/ssl/src/ssl_handshake.erl
>>> b/lib/ssl/src/ssl_handshake.erl
>>> index 1108edc..93d4efd 100644
>>> --- a/lib/ssl/src/ssl_handshake.erl
>>> +++ b/lib/ssl/src/ssl_handshake.erl
>>> @@ -1017,11 +1017,14 @@ decode_suites('3_bytes', Dec) ->
>>>  %%-------------Cipeher suite handling --------------------------------
>>>
>>>  available_suites(UserSuites, Version) ->
>>> +    VersionSupportedSuites = ssl_cipher:suites(Version),
>>>      case UserSuites of
>>>      [] ->
>>> -        ssl_cipher:suites(Version);
>>> +        VersionSupportedSuites;
>>>      _ ->
>>> -        UserSuites
>>> +        lists:filtermap(fun(Suite) ->
>>> +                    lists:member(Suite, VersionSupportedSuites)
>>> +                end, UserSuites)
>>>      end.
>>>
>>>  available_suites(ServerCert, UserSuites, Version, Curve) ->
>>>
>>>
>>>
>>>
>>>> Unfortunately, I don't understand how this code should be tested — most
>>>> of functions require #state{} or #connection_states{} which require
>>>> #ssl_options{} and other stuff, and at I have found no exported function
>>>> creating #ssl_options{} without side-effects.
>>>>
>>>
>>> #ssl_options{] is an internal representation of the proplists input list
>>> and should not be a visible in the API, records are generally not a good
>>> API choice as they enforce compile time dependencies.
>>>
>>>
>>> Regards Ingela - Erlang/OTP team - Ericsson AB
>>>
>>>
>>>
>>>>
>>>> 2014-04-15 16:52 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:
>>>>
>>>> Hello again!
>>>>>
>>>>>
>>>>>>  sometimes badarg in
>>>>>> erlang:size([22,3,1,0,158,1,0,0,154,3,1,83,74|...]) at tls_record.erl:122.
>>>>>>
>>>>>
>>>>> Same symptom was recently reported on erlang-bugs and it turned out to
>>>>> be due to upgrading a gen_tcp socket in active mode. When upgrading a
>>>>> gen_tcp socket to an ssl socket it must be put in passive mode ({active,
>>>>> false}) before the client
>>>>> is allowed to start the handshake.  Normaly if the upgrade is
>>>>> negotiated this is not a problem for the server to set the option
>>>>> before signaling to the client to go ahead with  the handshake. If the
>>>>> upgrade is only performed on the server side instead of calling the ssl API
>>>>> ( some people may do that due to the previously lack of possibility to
>>>>> specify ssl options when calling ssl:ssl_accept with an "sslsocket", this
>>>>> is no longer the case in 17.0) the listen socket needs to be put in passive
>>>>> mode (listen options are inherited by the acceptsocket) to make sure that
>>>>> the it will work, otherwhise it will work sometimes and sometime result in
>>>>> the error above.
>>>>>
>>>>> Regards Ingela Erlang/OTP team - Ericsson AB
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Danil Zagoskin | z@REDACTED
>>>>
>>>
>>>
>>
>>
>> --
>> Danil Zagoskin | z@REDACTED
>>
>
>


-- 
Danil Zagoskin | z@REDACTED
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140417/40d38b42/attachment.htm>


More information about the erlang-questions mailing list