Patch for 18 exabyte memory allocation failure

Jon Meredith jon@REDACTED
Tue Mar 8 21:13:35 CET 2011


Hi list,

Over the last few months Basho has been seeing intermittent beam crashes
with extremely large out of memory failures.  A customer just had a cluster
crash with multiple nodes exiting close to the same time with huge
allocation requests and we were able to get our hands on the coredumps for
the first time.

eheap_alloc: Cannot allocate 18446744071662201696 bytes of memory (of type
"heap_frag").
eheap_alloc: Cannot allocate 18446744071662201696 bytes of memory (of type
"heap_frag").
eheap_alloc: Cannot allocate 18446744071662201696 bytes of memory (of type
"heap_frag").

After analyzing using pstack, they all died in the same place

-----------------  lwp# 71 / thread# 71  --------------------
 fffffd7fff0e1eca _lwp_kill () + a
 fffffd7fff086fe9 raise () + 19
 fffffd7fff065f60 abort () + 90
 0000000000473575 erl_exit () + 155
 000000000046110e erts_alc_fatal_error () + 1de
 0000000000461177 ???????? ()
 000000000049792d ???????? ()
 00000000004ddc8c new_binary () + ac
 00000000004d77f1 erts_term_to_binary () + 1e1
 0000000000538c95 process_main () + 8235
 00000000004bae0c ???????? ()
 00000000005a6c43 ???????? ()
 fffffd7fff0dc39b _thr_setup () + 5b
 fffffd7fff0dc5c0 _lwp_start ()

Digging into through the stack, in erts_term_to_binary the size for the
result binary is computed as a Unit, then truncated and stored temporarily
as a signed int. When new_binary is called it is given a signed int which is
then converted back to a Uint for the allocation.

We've seen the crash on 64-bit Solaris/Linux systems where sizeof(int) == 4,
sizeof(Uint) == 8. When converting from signed int to unsigned long the
compiler helpfully sign extends the int to 8 bytes.  Any sizes >= 0x8000000
become 0xffffffff80000000 and above triggering the allocation failure.

Here's a small fragment to reproduce.

  Bin = list_to_binary([X rem 256 || X <- lists:seq(1, 65536)]).
  Blocks = 16#80000000 div size(Bin).
  BigBin = lists:duplicate(Blocks, Bin).
  term_to_binary(BigBin).

I've attached patches that changes new_binary to take a Uint size and fixed
the cases where callers were casting the size argument to an (int).  I've
also modified the temporary 'size' variable in erts_term_to_binary to be a
Uint.
The first patch applies to the r13b04 and r14b01 tarballs and the other
applies to the 'pu' branch on github as I pulled it today.

There were some other places in the code there are still problems but I
decided to leave them to people that know more than I do.

The most important one is around iolists and bitstrings - io_list_len and
bitstr_len both return integer lengths. I also noticed something in the
ssl_tls_erl function - I wasn't sure if it was possible to send a very long
buffer and prefix length. The code loader also uses ints, but it seems less
likely to bite anybody.

erts/emulator/beam/binary.
670:    bin = new_binary(p, (byte *)NULL, i);
// SHOULDFIX - io_list_len returns int
413:    bin = new_binary(BIF_P, (byte *)NULL, i);                        //
SHOULDFIX - bitstr_len returns int

erts/emulator/beam/erl_bif_port.c
1281:    Eterm bin = new_binary(pca->p, NULL, plen+len);
  SSL TLS - two ints, probably size limited

I think there may also be problems checking the length of binaries when
creating external binaries - the format only allows for a 32-bit unsigned
length and I didn't see a check in enc_term when I glanced at it.

Hope it helps somebody,

Cheers, Jon Meredith.
Basho Technologies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20110308/ed7a0983/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t2b_overflow-r13b04_r14b01.patch
Type: application/octet-stream
Size: 4070 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20110308/ed7a0983/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t2b_overflow-pu.patch
Type: application/octet-stream
Size: 3452 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20110308/ed7a0983/attachment-0001.obj>


More information about the erlang-patches mailing list