[erlang-bugs] segmentation fault in tree_delete at beam/erl_bestfit_alloc.c:431
pan@REDACTED
pan@REDACTED
Fri Mar 18 10:49:16 CET 2011
Hi Igor!
Sadly enough, this is the worst kind of core you could ever have :(
The core is generated in the allocators, but that's most probably not the
allocators fault. Something has written outside of an allocated area
earlier and now the error shows up in some (possibly/probaly) unrelated
place.
First of all, I have to ask if you have some non-OTP drivers or NIF's
loaded in the VM? Have you loaded some native code not supplied in the
Erlang distribution? In that case, try to rule out errors in that code and
in libraries loaded by that code by e.g. disabling it in some way (write
slower erlang-replacements etc).
Next question is if you use some drivers or NIF's provided by us that pull
third party libraries, like Wx oc Crypto (by using SSL etc). If we could
isolate the problem to a driver (our's or your's) the searchspace would be
greatly reduced.
Also, looking at the core locally would possibly help me to identify the
type of data that has been written into the block, which possibly could
narrow it down, so if you could tar your compiled build tree and the core
and put it on something where I can fetch it (mail me personally with the
details, if you can do that), that would be helpful.
If the workload is low, running the VM under Valgrind, would probably be
feasible. There is a special valgrind target when doing make in the
$ERL_TOP/erts/emulator directory, you can do 'make FLAVOR=smp valgrind' if
you have valgrind 3.4 or higher installed on the system. Running cerl
-valgrind (from the $ERL_TOP/bin directory) would then start erlang in the
valgrind virtual environment, which should point out any illegal memory
accesses (note that some warnings are expected, namely a lot of
PossiblyLost, which is due to us keeping pointers *into* structures
instead of to the beginning of the structures).
Another possibility is to compile all C code with -D_FORTIFY_SOURCE, which
may find faulty memory accesses too.
You say this is frequent. Is it in any way manually reproducable? Have you
got any idea of which erlang-code is run when this happens (i.e. during
some special kind of workload)? One possibility is that this is a compiler
error (in our compiler that is), so a module triggering the proble m would
also be interesting.
Please make sure to run R14B02 and recompile all erlang code with the
latest Erlang version to rule out any bug that's already corrected :)
Sorry for the big fluffy list of options, but as I said, this is a kind of
error that is really hard to track down...
Cheers,
/Patrik
On Mon, 14 Mar 2011, Igor Goryachev wrote:
> Hello.
>
> We are suffering of quite frequent segmentation faults on our erlangish
> environment. We run r14b01 node with a very small load on linux 2.6.32
> (Debian GNU/Linux Squeeze 6.0), which is virtual machine hosted under
> OpenVZ hypervisor (16 cores, Xeon 2.40GHz).
>
> I've tried to rebuild erlang with and without smp and threads, but in any
> case I'm getting the same behaviour.
>
> What additional helpful information should I provide?
>
>
> Core was generated by `/usr/lib/erlang/erts-5.8.2/bin/beam -K true -- -root /usr/lib/erlang -progname'.
> Program terminated with signal 11, Segmentation fault.
> #0 0x0000000000437e83 in tree_delete (allctr=0x7cbf20, del=0x7f93a8267460) at beam/erl_bestfit_alloc.c:431
> 431 beam/erl_bestfit_alloc.c: No such file or directory.
> in beam/erl_bestfit_alloc.c
> (gdb) where
> #0 0x0000000000437e83 in tree_delete (allctr=0x7cbf20, del=0x7f93a8267460) at beam/erl_bestfit_alloc.c:431
> #1 0x0000000000438bb2 in bf_unlink_free_block (allctr=0x7cbf20, size=<value optimized out>, cand_blk=<value optimized out>,
> cand_size=0) at beam/erl_bestfit_alloc.c:791
> #2 bf_get_free_block (allctr=0x7cbf20, size=<value optimized out>, cand_blk=<value optimized out>, cand_size=0)
> at beam/erl_bestfit_alloc.c:842
> #3 0x0000000000433506 in mbc_alloc_block (allctr=0x7cbf20, size=287) at beam/erl_alloc_util.c:631
> #4 mbc_alloc (allctr=0x7cbf20, size=287) at beam/erl_alloc_util.c:764
> #5 0x00000000004b8118 in erts_alloc (c_p=0x7f93a70a90e0, reg=<value optimized out>, live=<value optimized out>,
> build_size_term=<value optimized out>, extra_words=140272158101112, unit=8) at beam/erl_alloc.h:184
> #6 erts_bin_nrml_alloc (c_p=0x7f93a70a90e0, reg=<value optimized out>, live=<value optimized out>,
> build_size_term=<value optimized out>, extra_words=140272158101112, unit=8) at beam/erl_binary.h:253
> #7 erts_bs_append (c_p=0x7f93a70a90e0, reg=<value optimized out>, live=<value optimized out>, build_size_term=<value optimized out>,
> extra_words=140272158101112, unit=8) at beam/erl_bits.c:1325
> #8 0x00000000004e0a02 in process_main () at beam/beam_emu.c:3624
> #9 0x000000000043c5eb in erl_start (argc=33, argv=<value optimized out>) at beam/erl_init.c:1443
> #10 0x0000000000427ac9 in main (argc=8175392, argv=0x7f93a8267460) at sys/unix/erl_main.c:29
>
>
> --
> Igor Goryachev
>
> ________________________________________________________________
> erlang-bugs (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-bugs-unsubscribe@REDACTED
>
More information about the erlang-bugs
mailing list