[erlang-questions] Segfaults in erl_alloc when compiling with gcc 6.3.1?

anthonym@REDACTED anthonym@REDACTED
Mon Feb 26 20:17:30 CET 2018


Hi,

Recently, we've noticed some odd segfaults happening on some systems.  A
couple of different things have changed, but the primary suspect is the
version of GCC we used for compiling Erlang/OTP.

The system is a Centos 6 system.  I'm using devtoolset-6, so compiled the
erlang 18.3 with gcc 6.3.1.

The system logs have

2_scheduler[28351]: segfault at 38021511f0 ip 000000000046cd34 sp
00007fec906fdb30 error 6 in beam.smp[400000+2ac000]

The core file crashes here

Program terminated with signal 11, Segmentation fault.
#0  fix_cpool_free (allctr=0x2140540, ptr=0x7fec7deaf700)
    at beam/erl_alloc_util.c:1188
1188            fix->u.cpool.used--;

And the backtrace is

(gdb) bt
#0  fix_cpool_free (allctr=0x2140540, ptr=0x7fec7deaf700)
    at beam/erl_alloc_util.c:1188
#1  handle_delayed_fix_dealloc (allctr=0x2140540, ptr=0x7fec7deaf700)
    at beam/erl_alloc_util.c:1785
#2  0x000000000046e676 in handle_delayed_dealloc (allctr=0x2140540, limit=1,
    need_thr_progress=0x7fec906fdc68, thr_prgr_p=0x7fec906fdc70,
    more_work=0x7fec906fdc6c) at beam/erl_alloc_util.c:1905
#3  erts_alcu_check_delayed_dealloc (allctr=0x2140540, limit=1,
    need_thr_progress=0x7fec906fdc68, thr_prgr_p=0x7fec906fdc70,
    more_work=0x7fec906fdc6c) at beam/erl_alloc_util.c:1998
#4  0x0000000000460543 in erts_alloc_scheduler_handle_delayed_dealloc (
    vesdp=0x7fec93f95900, need_thr_progress=0x7fec906fdc68,
    thr_prgr_p=0x7fec906fdc70, more_work=0x7fec906fdc6c)
    at beam/erl_alloc.c:1822
#5  0x00000000004e37b7 in handle_delayed_dealloc (p=<value optimized out>,
    calls=2001) at beam/erl_process.c:1829
#6  handle_aux_work (p=<value optimized out>, calls=2001)
    at beam/erl_process.c:2364
#7  schedule (p=<value optimized out>, calls=2001) at beam/erl_process.c:9578
#8  0x000000000043e4ba in process_main () at beam/beam_emu.c:1254
#9  0x00000000004d3607 in sched_thread_func (vesdp=0x7fec93f95900)
    at beam/erl_process.c:8118
#10 0x00000000006303d7 in thr_wrapper (vtwd=0x7ffdf5f03380)
    at pthread/ethread.c:114
#11 0x00000038d10079d1 in start_thread () from /lib64/libpthread.so.0
#12 0x00000038d0ce88fd in clone () from /lib64/libc.so.6

We are attempting to eliminate other possible issues by building the same
software with the older system compiler and running it on half our
machines.  It's a relatively rare crash (happens maybe on one machine
every other day or so, but with 1200 machines in this cluster we should
know soon.

However, just wanted to put this out there and find out if any others have
seen these sorts of crashes with gcc 6.x, or if anyone has any other hints
or things to try to figure out why it might be crashing in this particular
place in the VM.

Thanks,

-Anthony





More information about the erlang-questions mailing list