[erlang-bugs] Segfault in asn1rt_nif:decode_ber_tlv

Simon Cornish zl9d97p02@REDACTED
Fri Sep 26 23:43:53 CEST 2014


Hi Björn,
Do you intend to add that test to a later release? Whilst I could not
crash the VM without it, it's not possible to unambiguously decode
some primitives with indefinite length (eg. OCTET STRING, INTEGER,
etc).
I'm all for strict encoding and more permissive decoders but in this
case I see more benefit in raising an error.
Regards,
 Simon

On 24 September 2014 07:11, Björn Gustavsson bjorn-at-erlang.org
|erlang-mail/gmail| <394upzd31t@REDACTED> wrote:
> Hi Simon,
>
> As you can see in my late answer to Alex,
> we have fixed the bug. But I did not add
> an test for a primitive type encoded with
> an indefinite length.
>
> https://github.com/erlang/otp/commit/7f385ebd984ed2931daa761819816b3e9da7d63c
>
> /Bjorn
>
> On Mon, Sep 22, 2014 at 3:50 PM, Simon Cornish <zl9d97p02@REDACTED> wrote:
>> Hi Alex,
>> Your analysis is correct regarding that the underlying fault is
>> reading beyond the end of the buffer
>> The attached patch solves your problem in the same (although
>> stylistically different) way.
>> It also corrects the fault that a primitive cannot be encoded using an
>> indefinite length. This is both according to X.690 and common sense!
>> If I get some breathing room later this week I'll put together a pull
>> request for this with some test.
>> Regards,
>>  Simon
>>
>>
>> On 12 September 2014 14:23, Alex Wilson alex-at-cooperi.net
>> |erlang-mail/gmail| <um3p2mgtft@REDACTED> wrote:
>>> Hi erlang-bugs,
>>>
>>> I've found a very interesting segmentation fault bug in
>>> asn1rt_nif:decode_ber_tlv.
>>>
>>> A simple case to reproduce it: just type this at the shell:
>>>
>>>   asn1rt_nif:decode_ber_tlv(<<"!",16#80>>).
>>>
>>> It seems like whether or not the segmentation fault manifests varies a lot
>>> between different OS, compiler and OTP versions. So far I've reproduced it
>>> on:
>>>
>>>  * R16B03-1 on Mac OSX Mavericks (clang)
>>>  * R16B01 on OpenBSD 5.3-stable with gcc 4.2.1
>>>  * R16B03-1 on Linux (Fedora) with gcc 4.8.3
>>>  * R17B01 on OpenBSD 5.6-current with gcc 4.2.1
>>>  * R17 developer build from "maint" branch as of last week, Mac OSX
>>> Mavericks (clang)
>>>
>>> I also asked some random people in #erlang on freenode to try it out and
>>> they also reproduced the segfault using my test case.
>>>
>>> Sometimes it doesn't segfault the first time around, but if you run it a few
>>> times at the shell it will do it eventually.
>>>
>>>
>>> Backtrace from R17 maint on Mac OSX:
>>>
>>> * thread #1: tid = 0x2410f5, 0x00007fff958db1aa
>>> libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 458, queue =
>>> 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1,
>>> address=0x107ffff0)
>>>   * frame #0: 0x00007fff958db1aa
>>> libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 458
>>>     frame #1: 0x000000001037aa49 asn1rt_nif.so`decode_ber_tlv_raw [inlined]
>>> ber_decode_begin(env=0x00007fff5fbff548, in_buf=<unavailable>,
>>> in_buf_len=<unavailable>, err_pos=<unavailable>) + 80 at asn1_erl_nif.c:854
>>>     frame #2: 0x000000001037a9f9
>>> asn1rt_nif.so`decode_ber_tlv_raw(env=0x00007fff5fbff548, argc=<unavailable>,
>>> argv=<unavailable>) + 41 at asn1_erl_nif.c:1256
>>>     frame #3: 0x00000000100e4cbd beam`process_main + 58077 at
>>> beam_emu.c:3524
>>>     frame #4: 0x00000000100196dd beam`erl_start(argc=21, argv=<unavailable>)
>>> + 5997 at erl_init.c:1990
>>>     frame #5: 0x0000000010000df9 beam`main(argc=<unavailable>,
>>> argv=<unavailable>) + 9 at erl_main.c:29
>>>
>>> Backtrace from R16B03-1 on Linux (gdb): (no symbols on that machine, sorry,
>>> but can clearly see it's the same trace)
>>>
>>> #0  __memcpy_sse2_unaligned ()
>>>     at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:157
>>> #1  0x00007f0c48cbdace in ber_decode_begin ()
>>>    from /usr/lib64/erlang/lib/asn1-2.0.4/priv/lib/asn1_erl_nif.so
>>> #2  0x00007f0c48cbdb6f in decode_ber_tlv_raw ()
>>>    from /usr/lib64/erlang/lib/asn1-2.0.4/priv/lib/asn1_erl_nif.so
>>> #3  0x0000000000533254 in process_main ()
>>> #4  0x00000000004508a0 in erl_start ()
>>> #5  0x0000000000434049 in main ()
>>>
>>>
>>> There's a lot of inlining and optimisation going on in this part of the
>>> code, which makes it hard to look back and forth between the assembly and C.
>>>
>>> Anyway, the segfault is caused by running off the end of memory after doing
>>> a memcpy with -2 as the length (0xfffffffe). This is because ib_index has
>>> gotten to a value of 4 when the in_buf_len is only 2.
>>>
>>> Using some watchpoints I figured out that ber_decode_value's if (indef == 1)
>>> block is responsible for the incrementing of ib_index beyond the end of the
>>> binary. I hacked up the following patch:
>>>
>>>
>>> --- a/lib/asn1/c_src/asn1_erl_nif.c
>>> +++ b/lib/asn1/c_src/asn1_erl_nif.c
>>> @@ -968,16 +968,16 @@ static int ber_decode_value(ErlNifEnv* env,
>>> ERL_NIF_TERM *value, unsigned cha
>>>      if (indef == 1) { /* in this case it is desireably to check that
>>> indefinite length
>>>       end bytes exist in inbuffer */
>>>         curr_head = enif_make_list(env, 0);
>>> -       while (!(in_buf[*ib_index] == 0 && in_buf[*ib_index + 1] == 0)) {
>>> -           if (*ib_index >= in_buf_len)
>>> -               return ASN1_INDEF_LEN_ERROR;
>>> -
>>> +       while ((*ib_index + 1 < in_buf_len) &&
>>> +               !(in_buf[*ib_index] == 0 && in_buf[*ib_index + 1] == 0)) {
>>>             if ((maybe_ret = ber_decode(env, &term, in_buf, ib_index,
>>> in_buf_len))
>>>                     <= ASN1_ERROR
>>>                     )
>>>                 return maybe_ret;
>>>             curr_head = enif_make_list_cell(env, term, curr_head);
>>>         }
>>> +       if (*ib_index + 1 >= in_buf_len)
>>> +               return ASN1_INDEF_LEN_ERROR;
>>>         enif_make_reverse_list(env, curr_head, value);
>>>         (*ib_index) += 2; /* skip the indefinite length end bytes */
>>>      } else if (form == ASN1_CONSTRUCTED)
>>>
>>>
>>> And it seems to stop it happening with my test case on OSX, at least. It
>>> makes two changes -- checking for *ib_index + 1 >= in_buf_len (because it's
>>> using in_buf[*ib_index + 1], it should check that _that_ index is valid, not
>>> just *ib_index). The second change is to move the check from an "if" inside
>>> the loop to a loop condition.
>>>
>>> The movement to the loop condition seems to be the main difference. Looking
>>> at the assembly, it seems like the compiler is reasoning (after inlining the
>>> entire ber_decode function into this loop) that it can hoist that if out
>>> somehow?... maybe because the ber_decode code will already check it (???)
>>>
>>> Putting it into the loop condition seems to stop this behaviour. But if I've
>>> misdiagnosed the problem (quite possible), then perhaps all I've done is
>>> permute the code just enough to stop the optimiser doing this one thing for
>>> now, and there are other problems just waiting in the wings...
>>>
>>> Hoping somebody who knows the asn1 code better than I do (or C in general)
>>> might be able to help!
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-bugs
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
>
> --
> Björn Gustavsson, Erlang/OTP, Ericsson AB



More information about the erlang-bugs mailing list