[erlang-bugs] Segfault in erl_interface attempting to decode certain large binaries

Thu Jul 24 03:33:40 CEST 2008

The tutorial was kept simplistic on purpose.

Actually using ei for free-form terms is quite simple and only requires 
less than a day or so to get used to all flavors of ei_* functions.

Good luck!

Serge

Jonathan Gray wrote:
> Yeah I saw that tutorial.  Unfortunately it's based around a fixed format of
> terms, thus doesn't make use of ei_get_type(), which is central to any kind
> of generic decoder.
> 
> Regardless I did more extensive testing and it looks like I will be
> implementing my own generic Erlang term decoder using ei.  Will post results
> when I'm done.
> 
> Jonathan
> 
> -----Original Message-----
> From: erlang-bugs-bounces@REDACTED [mailto:erlang-bugs-bounces@REDACTED]
> On Behalf Of Serge Aleynikov
> Sent: Wednesday, July 23, 2008 5:51 PM
> To: Jonathan Gray
> Cc: erlang-bugs@REDACTED
> Subject: Re: [erlang-bugs] Segfault in erl_interface attempting to decode
> certain large binaries
> 
> You can use the following tutorial to help you started with ei:
> http://www.trapexit.org/How_to_use_ei_to_marshal_binary_terms_in_port_progra
> ms
> 
> I haven't seen any newer erl_encode/erl_decode implementations.  Frankly 
> I've started doing that myself in C++ and got the recursive encoding 
> part, but haven't done the decoder and had to put this project on hold 
> due to other priorities.
> 
> Serge
> 
> Jonathan Gray wrote:
>> Serge,
>>
>> Sorry I was not more clear.  This is a port program with two remote
> systems
>> communicating via simple TCP packets (size header, buffer).
>>
>> Thanks for the advice.  The documentation suggests that erl_interface is
>> actually making use of the ei code.  I'm not sure if that's correct
> because
>> I see other resources on the web that say they are different (ei is new,
>> erl_interface is old).  However the error does exist in a malloc() call so
>> perhaps that can be avoided doing the term decodes myself?
>>
>> Unfortunately what I need (already have working but segfaulting on a large
>> decode) is a generic parser that converts ErlBinary into a number of
>> different formats (JSON, Python, C struct tree) and vice versa.  The ETERM
>> representation gave me an easy way to make recursive converters.
> Recreating
>> this generic behavior using ei seems like I'll be recreating functions of
>> erl_interface and losing the ETERM representation so I'll need to rewrite
> my
>> converters in a completely different way. 
>>
>> Rewriting it to go directly from binary would be a good thing in the long
>> run, just not something I was planning on doing at this stage.  Has anyone
>> ever written a newer generic erl_encode/erl_decode using the latest ei?
>>
>> I will take a closer look at ei and report back.
>>
>> Thanks for your help.
>>
>> Jonathan
>>
>>
>> -----Original Message-----
>> From: erlang-bugs-bounces@REDACTED
> [mailto:erlang-bugs-bounces@REDACTED]
>> On Behalf Of Serge Aleynikov
>> Sent: Wednesday, July 23, 2008 3:52 AM
>> To: jlist@REDACTED
>> Cc: erlang-bugs@REDACTED
>> Subject: Re: [erlang-bugs] Segfault in erl_interface attempting to decode
>> certain large binaries
>>
>> Jonathan,
>>
>> It's not clear from your description if you're using erl_interface to 
>> build a driver or a port program.  While I don't have an answer to your 
>> direct question, perhaps if you are writing a C port program you can use 
>> ei instead of erl_interface, and if you are writing a driver, you can 
>> use driver_output_term() / driver_send_term() and corresponding 
>> ErlDrvTermData* structures to pass data to/from the emulator (*). This 
>> is the fastest way to communicate with the emulator's port owner process.
>>
>> (*) See: http://www.erlang.org/doc/man/erl_driver.html
>> and also the source code of inet_drv.c in the distribution for various 
>> LOAD_*() macros that simplify working with ErlDrvTermData structures.
>>
>> Serge
>>
>>
>>
>>
>> jlist@REDACTED wrote:
>>> All,
>>>
>>> I have a TCP interface between an Erlang system and a C system.  Both
>>> send/receive marshaled binary Erlang terms and I have not had any
> problems
>>> to date.
>>>
>>> Today I began doing some more serious testing with larger chunks of
> binary
>>> to be decoded in C.
>>>
>>> We ran into a bug (it seems) with erl_interface 3.5.5.4 that is causing
> it
>>> to segfault during decoding.  The backtrace looks like this:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 46912496233216 (LWP 4091)]
>>> 0x00000000004032c4 in _erl_free_term ()
>>> (gdb) bt
>>> #0  0x00000000004032c4 in _erl_free_term ()
>>> #1  0x000000000040496b in erl_decode_it ()
>>> #2  0x0000000000404937 in erl_decode_it ()
>>> #3  0x0000000000404c93 in erl_decode_it ()
>>> #4  0x0000000000404c93 in erl_decode_it ()
>>> #5  0x0000000000405311 in erl_decode ()
>>> #6  0x0000000000401b38 in main (argc=1, argv=0x7fff51ec7938) at
>> badtest.c:28
>>> The unfortunate part is that the way this large binary term is generated
>>> cannot be done in any kind of sample code (it's being pulled off an
>> external
>>> database).
>>>
>>> Testing code:  http://jgray.la/erlang/erl_decode_segfault_test.tar.gz
>>>
>>> However, I have created a set of test files in C which recreate the
>>> segfault.  I stored the binary in a flat file (as 'badbinary') and have a
>>> testing program which reads it off disk and attempts to decode it.  To
>> prove
>>> the approach is sane (and that this segfault is related to something
>> strange
>>> about the decoding of this particular binary, not the size or general
>> format
>>> of the binary) there is a 'goodbinary' file and testing program for that.
>>>
>>> To use the test code:
>>>
>>> Untar/Ungzip the file.  You may need to edit the Makefile to fix the
> paths
>>> to your erl_interface library.
>>>
>>> 'make' and then you can:
>>>
>>> ./badtest  (this reads 'badbinary' and attempts to decode, causes
>> segfault)
>>> ./goodtest (this reads 'goodbinary' and successfully decodes it)  [nearly
>>> identical code to badtest.c but reads different file w/ different size]
>>>
>>> Also included is
>>>
>>> ./makegoodbin (a simple program that generates a large ETERM in an
>> identical
>>> format to the badbinary but contains duplicated binary data everywhere) 
>>>
>>>
>>> Notes:
>>>
>>> * The marshaled binary erlang term being sent to C can be successfully
>>> decoded/unmarshaled from within Erlang without a problem
>>> * This is reproducible with many different large erlang terms generated
>> from
>>> our database queries.  'makegoodbin.c' creates a term identical in format
>> to
>>> those causing problems, however it does not have the random distribution
>> of
>>> binary sizes and content, and so I'm not able to reproduce the problem in
>>> this way.
>>> * The entire system, end-to-end including this decoding step, works
>>> perfectly in most cases.  However when the data goes into the 100k+
> range,
>>> the segfaults start to happen.  That's why I created the 'makegoodbin'
>> which
>>> follows the same format.  Unfortunately that works even at sizes of >1MB
>>> adding to the confusion of the problem.
>>>
>>>
>>> Any help is appreciated.  Thanks.
>>>
>>> I apologize if this is a repost.  I never saw my original post hit the
>> list
>>> and did not receive any responses.
>>>
>>> Jonathan Gray
>>> Streamy Inc.
>>>
>>> _______________________________________________
>>> erlang-bugs mailing list
>>> erlang-bugs@REDACTED
>>> http://www.erlang.org/mailman/listinfo/erlang-bugs
>>>
>> _______________________________________________
>> erlang-bugs mailing list
>> erlang-bugs@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-bugs
>>
>>
> 
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs
> 
>