[erlang-bugs] Segfault in erl_interface attempting to decode certain large binaries

jlist <>
Wed Jul 23 01:02:13 CEST 2008


I have a TCP interface between an Erlang system and a C system.  Both
send/receive marshaled binary Erlang terms and I have not had any problems
to date.

Today I began doing some more serious testing with larger chunks of binary
to be decoded in C.

We ran into a bug (it seems) with erl_interface that is causing it
to segfault during decoding.  The backtrace looks like this:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 46912496233216 (LWP 4091)]
0x00000000004032c4 in _erl_free_term ()
(gdb) bt
#0  0x00000000004032c4 in _erl_free_term ()
#1  0x000000000040496b in erl_decode_it ()
#2  0x0000000000404937 in erl_decode_it ()
#3  0x0000000000404c93 in erl_decode_it ()
#4  0x0000000000404c93 in erl_decode_it ()
#5  0x0000000000405311 in erl_decode ()
#6  0x0000000000401b38 in main (argc=1, argv=0x7fff51ec7938) at badtest.c:28

The unfortunate part is that the way this large binary term is generated
cannot be done in any kind of sample code (it’s being pulled off an external

Testing code:  http://jgray.la/erlang/erl_decode_segfault_test.tar.gz

However, I have created a set of test files in C which recreate the
segfault.  I stored the binary in a flat file (as ‘badbinary’) and have a
testing program which reads it off disk and attempts to decode it.  To prove
the approach is sane (and that this segfault is related to something strange
about the decoding of this particular binary, not the size or general format
of the binary) there is a ‘goodbinary’ file and testing program for that.

To use the test code:

Untar/Ungzip the file.  You may need to edit the Makefile to fix the paths
to your erl_interface library.

‘make’ and then you can:

./badtest  (this reads ‘badbinary’ and attempts to decode, causes segfault)
./goodtest (this reads ‘goodbinary’ and successfully decodes it)  [nearly
identical code to badtest.c but reads different file w/ different size]

Also included is

./makegoodbin (a simple program that generates a large ETERM in an identical
format to the badbinary but contains duplicated binary data everywhere) 


* The marshaled binary erlang term being sent to C can be successfully
decoded/unmarshaled from within Erlang without a problem
* This is reproducible with many different large erlang terms generated from
our database queries.  ‘makegoodbin.c’ creates a term identical in format to
those causing problems, however it does not have the random distribution of
binary sizes and content, and so I’m not able to reproduce the problem in
this way.
* The entire system, end-to-end including this decoding step, works
perfectly in most cases.  However when the data goes into the 100k+ range,
the segfaults start to happen.  That’s why I created the ‘makegoodbin’ which
follows the same format.  Unfortunately that works even at sizes of >1MB
adding to the confusion of the problem.

Any help is appreciated.  Thanks.

I apologize if this is a repost.  I never saw my original post hit the list
and did not receive any responses.

Jonathan Gray
Streamy Inc.

