[erlang-bugs] segmentation fault in tree_delete at beam/erl_bestfit_alloc.c:431

pan <>
Fri Mar 18 10:49:16 CET 2011


Hi Igor!

Sadly enough, this is the worst kind of core you could ever have :(

The core is generated in the allocators, but that's most probably not the 
allocators fault. Something has written outside of an allocated area 
earlier and now the error shows up in some (possibly/probaly) unrelated 
place.

First of all, I have to ask if you have some non-OTP drivers or NIF's 
loaded in the VM? Have you loaded some native code not supplied in the 
Erlang distribution? In that case, try to rule out errors in that code and 
in libraries loaded by that code by e.g. disabling it in some way (write 
slower erlang-replacements etc).

Next question is if you use some drivers or NIF's provided by us that pull 
third party libraries, like Wx oc Crypto (by using SSL etc). If we could 
isolate the problem to a driver (our's or your's) the searchspace would be 
greatly reduced.

Also, looking at the core locally would possibly help me to identify the 
type of data that has been written into the block, which possibly could 
narrow it down, so if you could tar your compiled build tree and the core 
and put it on something where I can fetch it (mail me personally with the 
details, if you can do that), that would be helpful.

If the workload is low, running the VM under Valgrind, would probably be 
feasible. There is a special valgrind target when doing make in the 
$ERL_TOP/erts/emulator directory, you can do 'make FLAVOR=smp valgrind' if 
you have valgrind 3.4 or higher installed on the system. Running cerl 
-valgrind (from the $ERL_TOP/bin directory) would then start erlang in the 
valgrind virtual environment, which should point out any illegal memory 
accesses (note that some warnings are expected, namely a lot of 
PossiblyLost, which is due to us keeping pointers *into* structures 
instead of to the beginning of the structures).

Another possibility is to compile all C code with -D_FORTIFY_SOURCE, which 
may find faulty memory accesses too.

You say this is frequent. Is it in any way manually reproducable? Have you 
got any idea of which erlang-code is run when this happens (i.e. during 
some special kind of workload)? One possibility is that this is a compiler 
error (in our compiler that is), so a module triggering the proble m would 
also be interesting.

Please make sure to run R14B02 and recompile all erlang code with the 
latest Erlang version to rule out any bug that's already corrected :)

Sorry for the big fluffy list of options, but as I said, this is a kind of 
error that is really hard to track down...

Cheers,
/Patrik

On Mon, 14 Mar 2011, Igor Goryachev wrote:

> Hello.
>
> We are suffering of quite frequent segmentation faults on our erlangish
> environment. We run r14b01 node with a very small load on linux 2.6.32
> (Debian GNU/Linux Squeeze 6.0), which is virtual machine hosted under
> OpenVZ hypervisor (16 cores, Xeon 2.40GHz).
>
> I've tried to rebuild erlang with and without smp and threads, but in any
> case I'm getting the same behaviour.
>
> What additional helpful information should I provide?
>
>
> Core was generated by `/usr/lib/erlang/erts-5.8.2/bin/beam -K true -- -root /usr/lib/erlang -progname'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000000000437e83 in tree_delete (allctr=0x7cbf20, del=0x7f93a8267460) at beam/erl_bestfit_alloc.c:431
> 431     beam/erl_bestfit_alloc.c: No such file or directory.
>        in beam/erl_bestfit_alloc.c
> (gdb) where
> #0  0x0000000000437e83 in tree_delete (allctr=0x7cbf20, del=0x7f93a8267460) at beam/erl_bestfit_alloc.c:431
> #1  0x0000000000438bb2 in bf_unlink_free_block (allctr=0x7cbf20, size=<value optimized out>, cand_blk=<value optimized out>,
>    cand_size=0) at beam/erl_bestfit_alloc.c:791
> #2  bf_get_free_block (allctr=0x7cbf20, size=<value optimized out>, cand_blk=<value optimized out>, cand_size=0)
>    at beam/erl_bestfit_alloc.c:842
> #3  0x0000000000433506 in mbc_alloc_block (allctr=0x7cbf20, size=287) at beam/erl_alloc_util.c:631
> #4  mbc_alloc (allctr=0x7cbf20, size=287) at beam/erl_alloc_util.c:764
> #5  0x00000000004b8118 in erts_alloc (c_p=0x7f93a70a90e0, reg=<value optimized out>, live=<value optimized out>,
>    build_size_term=<value optimized out>, extra_words=140272158101112, unit=8) at beam/erl_alloc.h:184
> #6  erts_bin_nrml_alloc (c_p=0x7f93a70a90e0, reg=<value optimized out>, live=<value optimized out>,
>    build_size_term=<value optimized out>, extra_words=140272158101112, unit=8) at beam/erl_binary.h:253
> #7  erts_bs_append (c_p=0x7f93a70a90e0, reg=<value optimized out>, live=<value optimized out>, build_size_term=<value optimized out>,
>    extra_words=140272158101112, unit=8) at beam/erl_bits.c:1325
> #8  0x00000000004e0a02 in process_main () at beam/beam_emu.c:3624
> #9  0x000000000043c5eb in erl_start (argc=33, argv=<value optimized out>) at beam/erl_init.c:1443
> #10 0x0000000000427ac9 in main (argc=8175392, argv=0x7f93a8267460) at sys/unix/erl_main.c:29
>
>
> -- 
> Igor Goryachev
>
> ________________________________________________________________
> erlang-bugs (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:
>


More information about the erlang-bugs mailing list