[erlang-bugs] Segfault in erl_interface attempting to decode certain large binaries
Wed Jul 23 12:51:40 CEST 2008
It's not clear from your description if you're using erl_interface to
build a driver or a port program. While I don't have an answer to your
direct question, perhaps if you are writing a C port program you can use
ei instead of erl_interface, and if you are writing a driver, you can
use driver_output_term() / driver_send_term() and corresponding
ErlDrvTermData* structures to pass data to/from the emulator (*). This
is the fastest way to communicate with the emulator's port owner process.
(*) See: http://www.erlang.org/doc/man/erl_driver.html
and also the source code of inet_drv.c in the distribution for various
LOAD_*() macros that simplify working with ErlDrvTermData structures.
> I have a TCP interface between an Erlang system and a C system. Both
> send/receive marshaled binary Erlang terms and I have not had any problems
> to date.
> Today I began doing some more serious testing with larger chunks of binary
> to be decoded in C.
> We ran into a bug (it seems) with erl_interface 18.104.22.168 that is causing it
> to segfault during decoding. The backtrace looks like this:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 46912496233216 (LWP 4091)]
> 0x00000000004032c4 in _erl_free_term ()
> (gdb) bt
> #0 0x00000000004032c4 in _erl_free_term ()
> #1 0x000000000040496b in erl_decode_it ()
> #2 0x0000000000404937 in erl_decode_it ()
> #3 0x0000000000404c93 in erl_decode_it ()
> #4 0x0000000000404c93 in erl_decode_it ()
> #5 0x0000000000405311 in erl_decode ()
> #6 0x0000000000401b38 in main (argc=1, argv=0x7fff51ec7938) at badtest.c:28
> The unfortunate part is that the way this large binary term is generated
> cannot be done in any kind of sample code (it’s being pulled off an external
> Testing code: http://jgray.la/erlang/erl_decode_segfault_test.tar.gz
> However, I have created a set of test files in C which recreate the
> segfault. I stored the binary in a flat file (as ‘badbinary’) and have a
> testing program which reads it off disk and attempts to decode it. To prove
> the approach is sane (and that this segfault is related to something strange
> about the decoding of this particular binary, not the size or general format
> of the binary) there is a ‘goodbinary’ file and testing program for that.
> To use the test code:
> Untar/Ungzip the file. You may need to edit the Makefile to fix the paths
> to your erl_interface library.
> ‘make’ and then you can:
> ./badtest (this reads ‘badbinary’ and attempts to decode, causes segfault)
> ./goodtest (this reads ‘goodbinary’ and successfully decodes it) [nearly
> identical code to badtest.c but reads different file w/ different size]
> Also included is
> ./makegoodbin (a simple program that generates a large ETERM in an identical
> format to the badbinary but contains duplicated binary data everywhere)
> * The marshaled binary erlang term being sent to C can be successfully
> decoded/unmarshaled from within Erlang without a problem
> * This is reproducible with many different large erlang terms generated from
> our database queries. ‘makegoodbin.c’ creates a term identical in format to
> those causing problems, however it does not have the random distribution of
> binary sizes and content, and so I’m not able to reproduce the problem in
> this way.
> * The entire system, end-to-end including this decoding step, works
> perfectly in most cases. However when the data goes into the 100k+ range,
> the segfaults start to happen. That’s why I created the ‘makegoodbin’ which
> follows the same format. Unfortunately that works even at sizes of >1MB
> adding to the confusion of the problem.
> Any help is appreciated. Thanks.
> I apologize if this is a repost. I never saw my original post hit the list
> and did not receive any responses.
> Jonathan Gray
> Streamy Inc.
> erlang-bugs mailing list
More information about the erlang-bugs