[erlang-questions] Errors in SSL handshake (weird client)

Danil Zagoskin z@REDACTED
Thu Apr 17 20:04:18 CEST 2014


Hi!

Thank you for the patches!

Meanwhile we have deployed suite selection patch and now there is something
new in crash logs:
=CRASH REPORT==== 17-Apr-2014::20:32:24 ===
  crasher:
    initial call: tls_connection:init/1
    pid: <0.30376.19>
    registered_name: []
    exception exit: {function_clause,
                     [{tls_connection,handle_tls_handshake,
                       [#Fun<tls_connection.0.100316242>,abbreviated,
                        #state{role = server,
                             %% LOTS OF STUFF SKIPPED %%
                             connection_states = #connection_states{
                                      current_read = %% cipher_suite =
<<0,0>> %%
                                      pending_read = %% cipher_suite
= <<0,51>> %%
                             }
                             protocol_buffers = #protocol_buffers{
                                     tls_packets = [],
                                     tls_handshake_buffer =
<<92,64,37,228,209,208,235,232,47,69,57,232,90,106, ..... >>
                                     %% SKIPPED %%},
                             tls_handshake_history = {
                                     [

 [20,<<0,0,12>>,<<111,40,244,7,137,224,16,109,197,110,249,152>>],

 [2,<<0,0,70>>,<<3,2,83,80,2,23,175,124,25,138,178,243,166,22,

 150,239,220,170,215,108,54,169,67,79,124,183,

 189,89,35,239,244,87,102,19,32,11,15,20,173,67,

 209,70,180,252,109,228,108,60,203,91,126,77,12,

 231,203,43,221,11,234,147,28,231,91,18,49,228,
                                                         70,0,51,0>>],

 <<1,0,0,141,3,2,83,80,4,31,206,23,107,223,241,244,

 164,4,198,176,44,155,63,58,157,103,95,44,198,83,

 237,110,165,99,242,190,122,83,32,11,15,20,173,

 67,209,70,180,252,109,228,108,60,203,91,126,77,

 12,231,203,43,221,11,234,147,28,231,91,18,49,

 228,70,0,36,0,51,0,69,0,57,0,136,0,22,0,50,0,68,

 0,56,0,135,0,19,0,102,0,47,0,65,0,53,0,132,0,10,

 0,5,0,4,1,0,0,32,0,9,0,3,2,0,1,0,0,0,21,0,19,0,

 0,16,119,101,98,100,97,118,46,121,97,110,100,

 101,120,46,114,117>>
                                    ],
                                    [..........]
                             },
                             %% LOTS OF STUFF SKIPPED %%
                         }],
                       [{file,"tls_connection.erl"},{line,735}]},
                      {tls_connection,next_state,4,
                       [{file,"tls_connection.erl"},{line,458}]},

{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},
                      {proc_lib,init_p_do_apply,3,
                       [{file,"proc_lib.erl"},{line,239}]}]}
      in function  gen_fsm:terminate/7 (gen_fsm.erl, line 622)

Could that client work with other server implementation?
tls_handshake_buffer seems like a garbage (but I'm not sure), so maybe
here's no valid client lost.

Should server tolerate such conditions? Should an alert be thrown
when tls_handshake:get_tls_handshake returns {[], _}?


Another question about crashes:
Maybe tls_connection is a good place for format_status/2? It may be a good
idea to hide secrets when printing state (otherwise a person which has
access to error logs seems to be able to restore the private key or
decipher dumped traffic).



2014-04-17 19:04 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:

> Hi!
>
> I pushed some more patches to my gitrep on the branches:
>
> ia/ssl/recv/OTP-11878  (error handling of  recv on an active socket)
>
> ia/ssl/decrypt-alert   (correct decryption alert handling)
>
> Regards Ingela Erlang/OTP team - Ericsson AB
>
>
>
>
> 2014-04-17 11:19 GMT+02:00 Ingela Andin <ingela.andin@REDACTED>:
>
> Hi Danil!
>>
>>
>> 2014-04-17 8:52 GMT+02:00 Danil Zagoskin <z@REDACTED>:
>>
>> Hi, Ingela!
>>>
>>> The patch works like a charm, thank you! It completely fixes the initial
>>> problem with buggy client.
>>>
>>>
>> One bug down!
>>
>>
>>
>>>
>>> But there are some sad news, too:
>>>
>>>
>> Well reproducible is always good, means we can haunt it down and fix it
>> :)
>>
>>
>>
>>> While playing with my dummy server (
>>> https://github.com/stolen/ssldump/blob/master/src/ssldump.erl#L70) I
>>> accidentally reproduced badarg in erlang:size once more:
>>>
>>> =ERROR REPORT==== 17-Apr-2014::07:34:58 ===
>>> ** State machine <0.58.0> terminating
>>> ** Last message in was {tcp,#Port<0.1498>,
>>>
>>> [22,3,1,0,173,1,0,0,169,3,3,83,79,75,226,78,81,8,
>>>
>>>  77,79,97,154,127,165,92,183,20,239,70,183,158,
>>>
>>>  193,222,139,77,116,13,128,12,66,120,63,110,0,0,
>>>
>>>  94,0,255,192,36,192,35,192,10,192,9,192,7,192,8,
>>>
>>>  192,40,192,39,192,20,192,19,192,17,192,18,192,38,
>>>
>>>  192,37,192,42,192,41,192,5,192,4,192,2,192,3,192,
>>>
>>>  15,192,14,192,12,192,13,0,61,0,60,0,47,0,5,0,4,0,
>>>
>>>  53,0,10,0,103,0,107,0,51,0,57,0,22,0,175,0,174,0,
>>>
>>>  141,0,140,0,138,0,139,0,177,0,176,0,44,0,59,1,0,
>>>
>>>  0,34,0,10,0,8,0,6,0,23,0,24,0,25,0,11,0,2,1,0,0,
>>>                              13,0,12,0,10,5,1,4,1,2,1,4,3,2,3]}
>>> (meaningless state omitted)
>>> ** Reason for termination =
>>> ** {badarg,[{erlang,size,
>>>
>>> [[22,3,1,0,173,1,0,0,169,3,3,83,79,75,226,78,81,8,77,79,
>>>
>>> 97,154,127,165,92,183,20,239,70,183,158,193,222,139,77,
>>>
>>> 116,13,128,12,66,120,63,110,0,0,94,0,255,192,36,192,35,
>>>
>>> 192,10,192,9,192,7,192,8,192,40,192,39,192,20,192,19,
>>>
>>> 192,17,192,18,192,38,192,37,192,42,192,41,192,5,192,4,
>>>
>>> 192,2,192,3,192,15,192,14,192,12,192,13,0,61,0,60,0,47,
>>>
>>> 0,5,0,4,0,53,0,10,0,103,0,107,0,51,0,57,0,22,0,175,0,
>>>
>>> 174,0,141,0,140,0,138,0,139,0,177,0,176,0,44,0,59,1,0,0,
>>>
>>> 34,0,10,0,8,0,6,0,23,0,24,0,25,0,11,0,2,1,0,0,13,0,12,0,
>>>                       10,5,1,4,1,2,1,4,3,2,3]],
>>>                     []},
>>>             {tls_record,get_tls_records_aux,2,
>>>                         [{file,"tls_record.erl"},{line,122}]},
>>>             {tls_connection,next_tls_record,2,
>>>                             [{file,"tls_connection.erl"},{line,484}]},
>>>             {tls_connection,handle_info,3,
>>>                             [{file,"tls_connection.erl"},{line,307}]},
>>>              {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,503}]},
>>>
>>> {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>>
>>> The client used was simple curl under mac os, and this crash did not
>>> reproduce.
>>> As you see, the packet received as list instead of binary is
>>> client_hello which is the first message ssl process gets.
>>> Maybe there is some race condition when changing mode on just
>>> transport_accepted socket.
>>>
>>>
>>>
>>> Also, the same server when working with the same curl may stall at this
>>> line:
>>>   {ok, _} = ssl:recv(Socket, 0, 10000),
>>> process_info showed that the worker has {ssl,#sslsocket{}, Data} in its
>>> mailbox.
>>>
>>> I've tuned the server to reproduce this stall easier:
>>> https://github.com/stolen/ssldump/blob/b3eebc2fa85ca0bd8ed8470c400b972aa737bffc/src/ssldump.erl(master may change)
>>>
>>> The client is manually invoked curl which is manually interrupted after
>>> sufficiently large time of inactivity. The console log looks like this:
>>>
>>> stolen@REDACTED:~/ssldump$ ./dummyserver.sh 9999 ../cert/pipe.trololo
>>> Erlang/OTP 17 [erts-6.0] [source-07b8f44] [smp:2:2] [async-threads:10]
>>> [hipe] [kernel-poll:false]
>>>
>>> Eshell V6.0  (abort with ^G)
>>> 1> Accepting SSL in <0.46.0>
>>> Socket state: connection, options: {socket_options,list,0,0,0,true}
>>>
>>> =ERROR REPORT==== 17-Apr-2014::09:08:01 ===
>>> Error in process <0.46.0> with exit value:
>>> {{badmatch,{error,timeout}},[{ssldump,ssl_acceptor,1,[{file,"src/ssldump.erl"},{line,98}]}]}
>>>
>>> Accepting SSL in <0.47.0>
>>> Socket state: connection, options: {socket_options,binary,0,0,0,false}
>>> Accepting SSL in <0.48.0>
>>> Socket state: connection, options: {socket_options,binary,0,0,0,false}
>>> Accepting SSL in <0.51.0>
>>> Socket state: connection, options: {socket_options,list,0,0,0,true}
>>>
>>> =ERROR REPORT==== 17-Apr-2014::09:09:17 ===
>>> Error in process <0.51.0> with exit value:
>>> {{badmatch,{error,closed}},[{ssldump,ssl_acceptor,1,[{file,"src/ssldump.erl"},{line,98}]}]}
>>>
>>> Accepting SSL in <0.53.0>
>>> Socket state: connection, options: {socket_options,binary,0,0,0,false}
>>> Accepting SSL in <0.55.0>
>>> Socket state: connection, options: {socket_options,binary,0,0,0,false}
>>> Accepting SSL in <0.57.0>
>>> Socket state: connection, options: {socket_options,list,0,0,0,true}
>>>
>>> =ERROR REPORT==== 17-Apr-2014::09:11:26 ===
>>> Error in process <0.57.0> with exit value:
>>> {{badmatch,{error,closed}},[{ssldump,ssl_acceptor,1,[{file,"src/ssldump.erl"},{line,98}]}]}
>>>
>>>
>>> My interpretation:
>>>   - Given pool size X, the first of each X requests fails.
>>>   - First of failing requests catches timeout as expected, other ones
>>> just stall.
>>>   - Failing requests have emulated options [{mode, list}, {active,
>>> true}] while working ones have the opposite [{mode, binary}, {active,
>>> false}].
>>>
>>> After couple of further experiments I discovered that only every (X*N +
>>> 1)th acceptor inherits listening socket options, other ones always have
>>> [{mode, binary}, {active, false}].
>>>
>>>
>>>
>> Humm...   I am not sure if this is what happens in your case but the
>> following  should fix a potential inheritance problem
>>
>> diff --git a/lib/ssl/src/ssl.erl b/lib/ssl/src/ssl.erl
>> index 743753b..7768bbb 100644
>> --- a/lib/ssl/src/ssl.erl
>> +++ b/lib/ssl/src/ssl.erl
>> @@ -187,6 +187,7 @@ transport_accept(#sslsocket{pid = {ListenSocket,
>>              {error, Reason}
>>          end;
>>      {error, Reason} ->
>> +        ok = ssl_socket:setopts(Transport, ListenSocket, SocketValues),
>>          {error, Reason}
>>      end.
>>
>>
>>
>>
>>> And, apart from eventual option inheritance, shouldn't ssl:recv on
>>> active socket return {error, einval} like gen_tcp:recv does?
>>>
>>>
>>>
>> I think it should.
>>
>>
>> Regards Ingela Erlang/OTP Team Ericssson AB
>>
>>
>>
>>
>>
>>>  2014-04-16 19:36 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:
>>>
>>> Hi!
>>>>
>>>> 2014-04-16 15:56 GMT+02:00 Danil Zagoskin <z@REDACTED>:
>>>>
>>>> Thank you for the patch. I'll check it.
>>>>>
>>>>
>>>>
>>>> New patch, better tested ;),  on branch
>>>> ia/ssl/user-suites-match-negotiated-version at
>>>> https://github.com/IngelaAndin/otp
>>>>
>>>>
>>>>>
>>>>> (Style question: is lists:filtermap better than list comprehension?)
>>>>>
>>>>
>>>>
>>>> I think just a matter of taste!
>>>>
>>>>
>>>>> > #ssl_options{] is an internal representation of the proplists input
>>>>> list and should not be a visible in the API, records are generally not a
>>>>> good API choice as they enforce compile time dependencies.
>>>>>
>>>>> Yes, I understand that. I wanted to start broken client to see how
>>>>> server would work with it.
>>>>> The option to do that could be using sys:replace_state, but as far as
>>>>> I understand the code, client_hello is sent immediately after starting a
>>>>> process, so it seems there is no way to send crafted hello and continue
>>>>> handshake with couple of lines instead of ssl:connect.
>>>>>
>>>>> Given that possibility it could be much easier to reproduce server
>>>>> crashes when working with buggy clients.
>>>>> For example, How do I write a test for badmatch in tls_record:decode_cipher_text?
>>>>> The function takes #connection_states{}, and creating that record
>>>>> relies on #ssl_options{} which cannot be created without opening a
>>>>> socket and calling gen_fsm:enter_loop.
>>>>>
>>>>>
>>>> Yes  you have a good point.  I do not have a good short answer. Testing
>>>> error cases especially when the protocol sometimes
>>>> tries to hide the real error reason due to security issues and
>>>> shortcutting the API and creating raw protocol command is not trivial.
>>>>
>>>> Regards Ingela Erlang/OTP team - Ericsson AB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> 2014-04-16 17:07 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:
>>>>>
>>>>> Hi!
>>>>>>
>>>>>> 2014-04-16 0:07 GMT+02:00 Danil Zagoskin <z@REDACTED>:
>>>>>>
>>>>>> Hi, Ingela.
>>>>>>>
>>>>>>> I finally had some time for investigating this problem.
>>>>>>>
>>>>>>> I didn't see badarg in erlang:size again, but it definitely appeared
>>>>>>> in 17.0 with simple accept method described as minimal example at
>>>>>>> http://www.erlang.org/doc/apps/ssl/using_ssl.html (
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> the difference was transport_accept and ssl_accept were run in
>>>>>>> separate process)
>>>>>>>
>>>>>>
>>>>>> That  should be ok, you are suppose to be able to do that.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Today I met some new crash:
>>>>>>> ** {{badmatch,{alert,2,20,{"ssl_cipher.erl",215}}},
>>>>>>>
>>>>>>> [{tls_record,decode_cipher_text,2,[{file,"tls_record.erl"},{line,157}]},
>>>>>>>
>>>>>>>  {tls_connection,next_record,1,[{file,"tls_connection.erl"},{line,503}]},
>>>>>>>
>>>>>>>  {tls_connection,next_state,4,[{file,"tls_connection.erl"},{line,475}]},
>>>>>>>      {gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,503}]},
>>>>>>>
>>>>>>>  {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>>>>>>
>>>>>>> That's because ssl_record:decipher/3 may return #alert{}, but
>>>>>>> tls_record:decode_cipher_text/2 isn't expecting that.
>>>>>>>
>>>>>>>
>>>>>> Hum, looks like a bug. We will look into that.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> As for {case_clause,{4}} in ssl_v3:mac_hash/3 I found that
>>>>>>> precondition for crash is easily reproducible:
>>>>>>> https://gist.github.com/stolen/10780653
>>>>>>> Ordinary servers negotiate with sha hash or close socket while ssl
>>>>>>> in 17.0 negotiates with sha256 hash.
>>>>>>> I don't know the protocol good enough to perform further states, but
>>>>>>> I hope this helps.
>>>>>>>
>>>>>>>
>>>>>> Okej, the following patch should take care of that problem.
>>>>>>
>>>>>> diff --git a/lib/ssl/src/ssl.erl b/lib/ssl/src/ssl.erl
>>>>>> index 743753b..093d4af 100644
>>>>>> --- a/lib/ssl/src/ssl.erl
>>>>>> +++ b/lib/ssl/src/ssl.erl
>>>>>> @@ -953,8 +953,8 @@ handle_cipher_option(Value, Version)  when
>>>>>> is_list(Value) ->
>>>>>>      error:_->
>>>>>>          throw({error, {options, {ciphers, Value}}})
>>>>>>      end.
>>>>>> -binary_cipher_suites(Version, []) -> %% Defaults to all supported
>>>>>> suits
>>>>>> -    ssl_cipher:suites(Version);
>>>>>> +binary_cipher_suites(Version, []) ->
>>>>>> +    [];
>>>>>>  binary_cipher_suites(Version, [{_,_,_,_}| _] = Ciphers0) -> %%
>>>>>> Backwards compatibility
>>>>>>      Ciphers = [{KeyExchange, Cipher, Hash} || {KeyExchange, Cipher,
>>>>>> Hash, _} <- Ciphers0],
>>>>>>      binary_cipher_suites(Version, Ciphers);
>>>>>> diff --git a/lib/ssl/src/ssl_handshake.erl
>>>>>> b/lib/ssl/src/ssl_handshake.erl
>>>>>> index 1108edc..93d4efd 100644
>>>>>> --- a/lib/ssl/src/ssl_handshake.erl
>>>>>> +++ b/lib/ssl/src/ssl_handshake.erl
>>>>>> @@ -1017,11 +1017,14 @@ decode_suites('3_bytes', Dec) ->
>>>>>>  %%-------------Cipeher suite handling
>>>>>> --------------------------------
>>>>>>
>>>>>>  available_suites(UserSuites, Version) ->
>>>>>> +    VersionSupportedSuites = ssl_cipher:suites(Version),
>>>>>>      case UserSuites of
>>>>>>      [] ->
>>>>>> -        ssl_cipher:suites(Version);
>>>>>> +        VersionSupportedSuites;
>>>>>>      _ ->
>>>>>> -        UserSuites
>>>>>> +        lists:filtermap(fun(Suite) ->
>>>>>> +                    lists:member(Suite, VersionSupportedSuites)
>>>>>> +                end, UserSuites)
>>>>>>      end.
>>>>>>
>>>>>>  available_suites(ServerCert, UserSuites, Version, Curve) ->
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Unfortunately, I don't understand how this code should be tested —
>>>>>>> most of functions require #state{} or #connection_states{} which require
>>>>>>> #ssl_options{} and other stuff, and at I have found no exported function
>>>>>>> creating #ssl_options{} without side-effects.
>>>>>>>
>>>>>>
>>>>>> #ssl_options{] is an internal representation of the proplists input
>>>>>> list and should not be a visible in the API, records are generally not a
>>>>>> good API choice as they enforce compile time dependencies.
>>>>>>
>>>>>>
>>>>>> Regards Ingela - Erlang/OTP team - Ericsson AB
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> 2014-04-15 16:52 GMT+04:00 Ingela Andin <ingela.andin@REDACTED>:
>>>>>>>
>>>>>>> Hello again!
>>>>>>>>
>>>>>>>>
>>>>>>>>>  sometimes badarg in
>>>>>>>>> erlang:size([22,3,1,0,158,1,0,0,154,3,1,83,74|...]) at tls_record.erl:122.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Same symptom was recently reported on erlang-bugs and it turned out
>>>>>>>> to be due to upgrading a gen_tcp socket in active mode. When upgrading a
>>>>>>>> gen_tcp socket to an ssl socket it must be put in passive mode ({active,
>>>>>>>> false}) before the client
>>>>>>>> is allowed to start the handshake.  Normaly if the upgrade is
>>>>>>>> negotiated this is not a problem for the server to set the option
>>>>>>>> before signaling to the client to go ahead with  the handshake. If
>>>>>>>> the upgrade is only performed on the server side instead of calling the ssl
>>>>>>>> API ( some people may do that due to the previously lack of possibility to
>>>>>>>> specify ssl options when calling ssl:ssl_accept with an "sslsocket", this
>>>>>>>> is no longer the case in 17.0) the listen socket needs to be put in passive
>>>>>>>> mode (listen options are inherited by the acceptsocket) to make sure that
>>>>>>>> the it will work, otherwhise it will work sometimes and sometime result in
>>>>>>>> the error above.
>>>>>>>>
>>>>>>>> Regards Ingela Erlang/OTP team - Ericsson AB
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Danil Zagoskin | z@REDACTED
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Danil Zagoskin | z@REDACTED
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Danil Zagoskin | z@REDACTED
>>>
>>
>>
>


-- 
Danil Zagoskin | z@REDACTED
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140417/c97e72c2/attachment.htm>


More information about the erlang-questions mailing list