[erlang-patches] Patch for 18 exabyte memory allocation failure

Michael Truog mjtruog@REDACTED
Wed Mar 9 18:59:54 CET 2011


On 03/09/2011 09:46 AM, Masklinn wrote:
> On 2011-03-08, at 21:13 , Jon Meredith wrote:
>> Hi list,
>>
>> Over the last few months Basho has been seeing intermittent beam crashes
>> with extremely large out of memory failures.  A customer just had a cluster
>> crash with multiple nodes exiting close to the same time with huge
>> allocation requests and we were able to get our hands on the coredumps for
>> the first time.
>>
>> eheap_alloc: Cannot allocate 18446744071662201696 bytes of memory (of type
>> "heap_frag").
>> eheap_alloc: Cannot allocate 18446744071662201696 bytes of memory (of type
>> "heap_frag").
>> eheap_alloc: Cannot allocate 18446744071662201696 bytes of memory (of type
>> "heap_frag").
>>
>> After analyzing using pstack, they all died in the same place
>>
>> -----------------  lwp# 71 / thread# 71  --------------------
>> fffffd7fff0e1eca _lwp_kill () + a
>> fffffd7fff086fe9 raise () + 19
>> fffffd7fff065f60 abort () + 90
>> 0000000000473575 erl_exit () + 155
>> 000000000046110e erts_alc_fatal_error () + 1de
>> 0000000000461177 ???????? ()
>> 000000000049792d ???????? ()
>> 00000000004ddc8c new_binary () + ac
>> 00000000004d77f1 erts_term_to_binary () + 1e1
>> 0000000000538c95 process_main () + 8235
>> 00000000004bae0c ???????? ()
>> 00000000005a6c43 ???????? ()
>> fffffd7fff0dc39b _thr_setup () + 5b
>> fffffd7fff0dc5c0 _lwp_start ()
>>
>> Digging into through the stack, in erts_term_to_binary the size for the
>> result binary is computed as a Unit, then truncated and stored temporarily
>> as a signed int. When new_binary is called it is given a signed int which is
>> then converted back to a Uint for the allocation.
>>
>> We've seen the crash on 64-bit Solaris/Linux systems where sizeof(int) == 4,
>> sizeof(Uint) == 8. When converting from signed int to unsigned long the
>> compiler helpfully sign extends the int to 8 bytes.  Any sizes >= 0x8000000
>> become 0xffffffff80000000 and above triggering the allocation failure.
>>
>> Here's a small fragment to reproduce.
>>
>>  Bin = list_to_binary([X rem 256 || X <- lists:seq(1, 65536)]).
>>  Blocks = 16#80000000 div size(Bin).
>>  BigBin = lists:duplicate(Blocks, Bin).
>>  term_to_binary(BigBin).
>>
>> I've attached patches that changes new_binary to take a Uint size and fixed
>> the cases where callers were casting the size argument to an (int).  I've
>> also modified the temporary 'size' variable in erts_term_to_binary to be a
>> Uint.
>> The first patch applies to the r13b04 and r14b01 tarballs and the other
>> applies to the 'pu' branch on github as I pulled it today.
>>
>> There were some other places in the code there are still problems but I
>> decided to leave them to people that know more than I do.
>>
>> The most important one is around iolists and bitstrings - io_list_len and
>> bitstr_len both return integer lengths. I also noticed something in the
>> ssl_tls_erl function - I wasn't sure if it was possible to send a very long
>> buffer and prefix length. The code loader also uses ints, but it seems less
>> likely to bite anybody.
>>
>> erts/emulator/beam/binary.
>> 670:    bin = new_binary(p, (byte *)NULL, i);
>> // SHOULDFIX - io_list_len returns int
>> 413:    bin = new_binary(BIF_P, (byte *)NULL, i);                        //
>> SHOULDFIX - bitstr_len returns int
>>
>> erts/emulator/beam/erl_bif_port.c
>> 1281:    Eterm bin = new_binary(pca->p, NULL, plen+len);
>>  SSL TLS - two ints, probably size limited
>>
>> I think there may also be problems checking the length of binaries when
>> creating external binaries - the format only allows for a 32-bit unsigned
>> length and I didn't see a check in enc_term when I glanced at it.
>>
>> Hope it helps somebody,
> This might be a stupid query, but shouldn't these kinds of sizes use something like size_t or a similar typedef instead of passing them around as if they were "normal" integers
Most people use the standard include file "stdint.h"
http://en.wikipedia.org/wiki/Stdint.h
The behavior seems to indicate Uint is an unsigned long.  Using the stdint.h types could also help make the ei interface less ambiguous with its integer types.


More information about the erlang-patches mailing list