[erlang-bugs] Segfault in asn1rt_nif:decode_ber_tlv

Wed Sep 24 16:06:20 CEST 2014

Hi Alex,

Thanks for the bug report and sorry for the late answer.

We have included a fix for the problem in 17.3
which was released recently:

   OTP-12145  Robustness when decoding incorrect BER messages has been
     improved.

Here is the commit that fixes the bug:

https://github.com/erlang/otp/commit/7f385ebd984ed2931daa761819816b3e9da7d63c

/Bjorn

On Fri, Sep 12, 2014 at 2:23 PM, Alex Wilson <alex@REDACTED> wrote:
> Hi erlang-bugs,
>
> I've found a very interesting segmentation fault bug in
> asn1rt_nif:decode_ber_tlv.
>
> A simple case to reproduce it: just type this at the shell:
>
>   asn1rt_nif:decode_ber_tlv(<<"!",16#80>>).
>
> It seems like whether or not the segmentation fault manifests varies a lot
> between different OS, compiler and OTP versions. So far I've reproduced it
> on:
>
>  * R16B03-1 on Mac OSX Mavericks (clang)
>  * R16B01 on OpenBSD 5.3-stable with gcc 4.2.1
>  * R16B03-1 on Linux (Fedora) with gcc 4.8.3
>  * R17B01 on OpenBSD 5.6-current with gcc 4.2.1
>  * R17 developer build from "maint" branch as of last week, Mac OSX
> Mavericks (clang)
>
> I also asked some random people in #erlang on freenode to try it out and
> they also reproduced the segfault using my test case.
>
> Sometimes it doesn't segfault the first time around, but if you run it a few
> times at the shell it will do it eventually.
>
>
> Backtrace from R17 maint on Mac OSX:
>
> * thread #1: tid = 0x2410f5, 0x00007fff958db1aa
> libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 458, queue =
> 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1,
> address=0x107ffff0)
>   * frame #0: 0x00007fff958db1aa
> libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 458
>     frame #1: 0x000000001037aa49 asn1rt_nif.so`decode_ber_tlv_raw [inlined]
> ber_decode_begin(env=0x00007fff5fbff548, in_buf=<unavailable>,
> in_buf_len=<unavailable>, err_pos=<unavailable>) + 80 at asn1_erl_nif.c:854
>     frame #2: 0x000000001037a9f9
> asn1rt_nif.so`decode_ber_tlv_raw(env=0x00007fff5fbff548, argc=<unavailable>,
> argv=<unavailable>) + 41 at asn1_erl_nif.c:1256
>     frame #3: 0x00000000100e4cbd beam`process_main + 58077 at
> beam_emu.c:3524
>     frame #4: 0x00000000100196dd beam`erl_start(argc=21, argv=<unavailable>)
> + 5997 at erl_init.c:1990
>     frame #5: 0x0000000010000df9 beam`main(argc=<unavailable>,
> argv=<unavailable>) + 9 at erl_main.c:29
>
> Backtrace from R16B03-1 on Linux (gdb): (no symbols on that machine, sorry,
> but can clearly see it's the same trace)
>
> #0  __memcpy_sse2_unaligned ()
>     at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:157
> #1  0x00007f0c48cbdace in ber_decode_begin ()
>    from /usr/lib64/erlang/lib/asn1-2.0.4/priv/lib/asn1_erl_nif.so
> #2  0x00007f0c48cbdb6f in decode_ber_tlv_raw ()
>    from /usr/lib64/erlang/lib/asn1-2.0.4/priv/lib/asn1_erl_nif.so
> #3  0x0000000000533254 in process_main ()
> #4  0x00000000004508a0 in erl_start ()
> #5  0x0000000000434049 in main ()
>
>
> There's a lot of inlining and optimisation going on in this part of the
> code, which makes it hard to look back and forth between the assembly and C.
>
> Anyway, the segfault is caused by running off the end of memory after doing
> a memcpy with -2 as the length (0xfffffffe). This is because ib_index has
> gotten to a value of 4 when the in_buf_len is only 2.
>
> Using some watchpoints I figured out that ber_decode_value's if (indef == 1)
> block is responsible for the incrementing of ib_index beyond the end of the
> binary. I hacked up the following patch:
>
>
> --- a/lib/asn1/c_src/asn1_erl_nif.c
> +++ b/lib/asn1/c_src/asn1_erl_nif.c
> @@ -968,16 +968,16 @@ static int ber_decode_value(ErlNifEnv* env,
> ERL_NIF_TERM *value, unsigned cha
>      if (indef == 1) { /* in this case it is desireably to check that
> indefinite length
>       end bytes exist in inbuffer */
>         curr_head = enif_make_list(env, 0);
> -       while (!(in_buf[*ib_index] == 0 && in_buf[*ib_index + 1] == 0)) {
> -           if (*ib_index >= in_buf_len)
> -               return ASN1_INDEF_LEN_ERROR;
> -
> +       while ((*ib_index + 1 < in_buf_len) &&
> +               !(in_buf[*ib_index] == 0 && in_buf[*ib_index + 1] == 0)) {
>             if ((maybe_ret = ber_decode(env, &term, in_buf, ib_index,
> in_buf_len))
>                     <= ASN1_ERROR
>                     )
>                 return maybe_ret;
>             curr_head = enif_make_list_cell(env, term, curr_head);
>         }
> +       if (*ib_index + 1 >= in_buf_len)
> +               return ASN1_INDEF_LEN_ERROR;
>         enif_make_reverse_list(env, curr_head, value);
>         (*ib_index) += 2; /* skip the indefinite length end bytes */
>      } else if (form == ASN1_CONSTRUCTED)
>
>
> And it seems to stop it happening with my test case on OSX, at least. It
> makes two changes -- checking for *ib_index + 1 >= in_buf_len (because it's
> using in_buf[*ib_index + 1], it should check that _that_ index is valid, not
> just *ib_index). The second change is to move the check from an "if" inside
> the loop to a loop condition.
>
> The movement to the loop condition seems to be the main difference. Looking
> at the assembly, it seems like the compiler is reasoning (after inlining the
> entire ber_decode function into this loop) that it can hoist that if out
> somehow?... maybe because the ber_decode code will already check it (???)
>
> Putting it into the loop condition seems to stop this behaviour. But if I've
> misdiagnosed the problem (quite possible), then perhaps all I've done is
> permute the code just enough to stop the optimiser doing this one thing for
> now, and there are other problems just waiting in the wings...
>
> Hoping somebody who knows the asn1 code better than I do (or C in general)
> might be able to help!
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-- 
Björn Gustavsson, Erlang/OTP, Ericsson AB