[erlang-patches] Patch for 18 exabyte memory allocation failure
Masklinn
masklinn@REDACTED
Wed Mar 9 18:46:54 CET 2011
On 2011-03-08, at 21:13 , Jon Meredith wrote:
> Hi list,
>
> Over the last few months Basho has been seeing intermittent beam crashes
> with extremely large out of memory failures. A customer just had a cluster
> crash with multiple nodes exiting close to the same time with huge
> allocation requests and we were able to get our hands on the coredumps for
> the first time.
>
> eheap_alloc: Cannot allocate 18446744071662201696 bytes of memory (of type
> "heap_frag").
> eheap_alloc: Cannot allocate 18446744071662201696 bytes of memory (of type
> "heap_frag").
> eheap_alloc: Cannot allocate 18446744071662201696 bytes of memory (of type
> "heap_frag").
>
> After analyzing using pstack, they all died in the same place
>
> ----------------- lwp# 71 / thread# 71 --------------------
> fffffd7fff0e1eca _lwp_kill () + a
> fffffd7fff086fe9 raise () + 19
> fffffd7fff065f60 abort () + 90
> 0000000000473575 erl_exit () + 155
> 000000000046110e erts_alc_fatal_error () + 1de
> 0000000000461177 ???????? ()
> 000000000049792d ???????? ()
> 00000000004ddc8c new_binary () + ac
> 00000000004d77f1 erts_term_to_binary () + 1e1
> 0000000000538c95 process_main () + 8235
> 00000000004bae0c ???????? ()
> 00000000005a6c43 ???????? ()
> fffffd7fff0dc39b _thr_setup () + 5b
> fffffd7fff0dc5c0 _lwp_start ()
>
> Digging into through the stack, in erts_term_to_binary the size for the
> result binary is computed as a Unit, then truncated and stored temporarily
> as a signed int. When new_binary is called it is given a signed int which is
> then converted back to a Uint for the allocation.
>
> We've seen the crash on 64-bit Solaris/Linux systems where sizeof(int) == 4,
> sizeof(Uint) == 8. When converting from signed int to unsigned long the
> compiler helpfully sign extends the int to 8 bytes. Any sizes >= 0x8000000
> become 0xffffffff80000000 and above triggering the allocation failure.
>
> Here's a small fragment to reproduce.
>
> Bin = list_to_binary([X rem 256 || X <- lists:seq(1, 65536)]).
> Blocks = 16#80000000 div size(Bin).
> BigBin = lists:duplicate(Blocks, Bin).
> term_to_binary(BigBin).
>
> I've attached patches that changes new_binary to take a Uint size and fixed
> the cases where callers were casting the size argument to an (int). I've
> also modified the temporary 'size' variable in erts_term_to_binary to be a
> Uint.
> The first patch applies to the r13b04 and r14b01 tarballs and the other
> applies to the 'pu' branch on github as I pulled it today.
>
> There were some other places in the code there are still problems but I
> decided to leave them to people that know more than I do.
>
> The most important one is around iolists and bitstrings - io_list_len and
> bitstr_len both return integer lengths. I also noticed something in the
> ssl_tls_erl function - I wasn't sure if it was possible to send a very long
> buffer and prefix length. The code loader also uses ints, but it seems less
> likely to bite anybody.
>
> erts/emulator/beam/binary.
> 670: bin = new_binary(p, (byte *)NULL, i);
> // SHOULDFIX - io_list_len returns int
> 413: bin = new_binary(BIF_P, (byte *)NULL, i); //
> SHOULDFIX - bitstr_len returns int
>
> erts/emulator/beam/erl_bif_port.c
> 1281: Eterm bin = new_binary(pca->p, NULL, plen+len);
> SSL TLS - two ints, probably size limited
>
> I think there may also be problems checking the length of binaries when
> creating external binaries - the format only allows for a 32-bit unsigned
> length and I didn't see a check in enc_term when I glanced at it.
>
> Hope it helps somebody,
This might be a stupid query, but shouldn't these kinds of sizes use something like size_t or a similar typedef instead of passing them around as if they were "normal" integers?
More information about the erlang-patches
mailing list