[erlang-bugs] Segfault in asn1rt_nif:decode_ber_tlv
Alex Wilson
alex@REDACTED
Fri Sep 12 14:23:02 CEST 2014
Hi erlang-bugs,
I've found a very interesting segmentation fault bug in
asn1rt_nif:decode_ber_tlv.
A simple case to reproduce it: just type this at the shell:
asn1rt_nif:decode_ber_tlv(<<"!",16#80>>).
It seems like whether or not the segmentation fault manifests varies a
lot between different OS, compiler and OTP versions. So far I've
reproduced it on:
* R16B03-1 on Mac OSX Mavericks (clang)
* R16B01 on OpenBSD 5.3-stable with gcc 4.2.1
* R16B03-1 on Linux (Fedora) with gcc 4.8.3
* R17B01 on OpenBSD 5.6-current with gcc 4.2.1
* R17 developer build from "maint" branch as of last week, Mac OSX
Mavericks (clang)
I also asked some random people in #erlang on freenode to try it out and
they also reproduced the segfault using my test case.
Sometimes it doesn't segfault the first time around, but if you run it a
few times at the shell it will do it eventually.
Backtrace from R17 maint on Mac OSX:
* thread #1: tid = 0x2410f5, 0x00007fff958db1aa
libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 458, queue
= 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1,
address=0x107ffff0)
* frame #0: 0x00007fff958db1aa
libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 458
frame #1: 0x000000001037aa49 asn1rt_nif.so`decode_ber_tlv_raw
[inlined] ber_decode_begin(env=0x00007fff5fbff548, in_buf=<unavailable>,
in_buf_len=<unavailable>, err_pos=<unavailable>) + 80 at asn1_erl_nif.c:854
frame #2: 0x000000001037a9f9
asn1rt_nif.so`decode_ber_tlv_raw(env=0x00007fff5fbff548,
argc=<unavailable>, argv=<unavailable>) + 41 at asn1_erl_nif.c:1256
frame #3: 0x00000000100e4cbd beam`process_main + 58077 at
beam_emu.c:3524
frame #4: 0x00000000100196dd beam`erl_start(argc=21,
argv=<unavailable>) + 5997 at erl_init.c:1990
frame #5: 0x0000000010000df9 beam`main(argc=<unavailable>,
argv=<unavailable>) + 9 at erl_main.c:29
Backtrace from R16B03-1 on Linux (gdb): (no symbols on that machine,
sorry, but can clearly see it's the same trace)
#0 __memcpy_sse2_unaligned ()
at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:157
#1 0x00007f0c48cbdace in ber_decode_begin ()
from /usr/lib64/erlang/lib/asn1-2.0.4/priv/lib/asn1_erl_nif.so
#2 0x00007f0c48cbdb6f in decode_ber_tlv_raw ()
from /usr/lib64/erlang/lib/asn1-2.0.4/priv/lib/asn1_erl_nif.so
#3 0x0000000000533254 in process_main ()
#4 0x00000000004508a0 in erl_start ()
#5 0x0000000000434049 in main ()
There's a lot of inlining and optimisation going on in this part of the
code, which makes it hard to look back and forth between the assembly and C.
Anyway, the segfault is caused by running off the end of memory after
doing a memcpy with -2 as the length (0xfffffffe). This is because
ib_index has gotten to a value of 4 when the in_buf_len is only 2.
Using some watchpoints I figured out that ber_decode_value's if (indef
== 1) block is responsible for the incrementing of ib_index beyond the
end of the binary. I hacked up the following patch:
--- a/lib/asn1/c_src/asn1_erl_nif.c
+++ b/lib/asn1/c_src/asn1_erl_nif.c
@@ -968,16 +968,16 @@ static int ber_decode_value(ErlNifEnv* env,
ERL_NIF_TERM *value, unsigned cha
if (indef == 1) { /* in this case it is desireably to check that
indefinite length
end bytes exist in inbuffer */
curr_head = enif_make_list(env, 0);
- while (!(in_buf[*ib_index] == 0 && in_buf[*ib_index + 1] == 0)) {
- if (*ib_index >= in_buf_len)
- return ASN1_INDEF_LEN_ERROR;
-
+ while ((*ib_index + 1 < in_buf_len) &&
+ !(in_buf[*ib_index] == 0 && in_buf[*ib_index + 1] == 0)) {
if ((maybe_ret = ber_decode(env, &term, in_buf, ib_index,
in_buf_len))
<= ASN1_ERROR
)
return maybe_ret;
curr_head = enif_make_list_cell(env, term, curr_head);
}
+ if (*ib_index + 1 >= in_buf_len)
+ return ASN1_INDEF_LEN_ERROR;
enif_make_reverse_list(env, curr_head, value);
(*ib_index) += 2; /* skip the indefinite length end bytes */
} else if (form == ASN1_CONSTRUCTED)
And it seems to stop it happening with my test case on OSX, at least. It
makes two changes -- checking for *ib_index + 1 >= in_buf_len (because
it's using in_buf[*ib_index + 1], it should check that _that_ index is
valid, not just *ib_index). The second change is to move the check from
an "if" inside the loop to a loop condition.
The movement to the loop condition seems to be the main difference.
Looking at the assembly, it seems like the compiler is reasoning (after
inlining the entire ber_decode function into this loop) that it can
hoist that if out somehow?... maybe because the ber_decode code will
already check it (???)
Putting it into the loop condition seems to stop this behaviour. But if
I've misdiagnosed the problem (quite possible), then perhaps all I've
done is permute the code just enough to stop the optimiser doing this
one thing for now, and there are other problems just waiting in the wings...
Hoping somebody who knows the asn1 code better than I do (or C in
general) might be able to help!
More information about the erlang-bugs
mailing list