[erlang-questions] Strange crash and hang

David Welton davidnwelton@REDACTED
Fri Sep 19 10:59:42 CEST 2014


Hi,

We're getting an odd crash that I suspect is related to one we had in
the past, where we were getting a bad term from a C node.  Since the
term being sent is pretty large (20+ megabytes), when this gets caught
up in the error logger and turned from a large binary into a
linked-list of the string representation of that binary, it kills
Erlang.

What's particularly odd about it is that many times, it doesn't finish
writing the crash dump, it just sits there:

Gdb:
#0  0x00007f0a0c020e93 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000005495e0 in erts_sys_main_thread ()
#2  0x00000000004561d0 in erl_start ()
#3  0x00000000004367d9 in main ()

strace -f -p 9205
Process 9205 attached with 19 threads - interrupt to quit
[pid  9248] futex(0x7f0a0b0c0550, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9247] futex(0x7f0a0b0c0510, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9246] futex(0x7f0a0b0c04d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9245] futex(0x7f0a0b0c0490, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9244] futex(0x7f0a0b0c0450, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9243] wait4(-1,  <unfinished ...>
[pid  9241] futex(0x7f0a0b0c0390, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9242] futex(0x7f0a0b0c03d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9240] futex(0x7f0a0b0c0350, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9239] futex(0x7f0a0b0c0310, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9238] futex(0x7f0a0b0c02d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9236] futex(0x7f0a0b0c0250, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9237] futex(0x7f0a0b0c0290, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9235] futex(0x7f0a0b0c0210, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9234] futex(0x7f0a0b0c01d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9233] futex(0x7f0a0b0c0190, FUTEX_WAIT_PRIVATE, 4294967295, NULL
<unfinished ...>
[pid  9232] futex(0x888484, FUTEX_WAIT_PRIVATE, 3, NULL <unfinished ...>
[pid  9231] read(6,  <unfinished ...>
[pid  9205] select(0, NULL, NULL, NULL, NULL^C <unfinished ...>

It's not writing to the crash dump file or doing anything really, it's
just stuck in that select.

Any ideas what may be happening?  Here's the beginning of the crash dump.

Slogan: eheap_alloc: Cannot allocate 3049303816 bytes of memory (of
type "heap_frag").
System version: Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:4:4]
[async-threads:10] [hipe] [kernel-poll:false]
Compiled: Mon Jul  7 15:00:04 2014
Taints: crypto

Thoughts or ideas?  I'm going to try dumping in a recompiled beam.smp
in order to see if I can dump a simple debug message when things crash
on a bad term.

Thanks
-- 
David N. Welton

http://www.welton.it/davidw/

http://www.dedasys.com/



More information about the erlang-questions mailing list