[erlang-bugs] [erlang-questions] beam.smp crash reproducibly

Sverker Eriksson sverker.eriksson@REDACTED
Thu Nov 13 16:32:54 CET 2014


On 11/12/2014 11:16 PM, Maxim Sokhatsky wrote:
> Hello!
>
> tl;dr We’ve managed to put beam.smp in crash reproducibly.
>        It is resistant to reproduce on 17.3 and 16B02 versions.
>        This happens under heavy load. No HiPE. No NIFs. ulimit is ok.
>
> We have very simple application that consumes RabbitMQ queue and store data in mnesia's disc_copies. For that purposes we use RabbitMQ client stack amqp_client/rabbit_common wrapped in our simple library synrc/mqs (300 LOC) along with very simple wrapper over mnesia synrc/kvs (200 LOC). We have 16GB RAM on powerful machine and performance is good. However after reaching memory consumption near 10GB the system goes to core. We used the original Ubuntu 12.04 package R16B02 which was without symbol information needed to bug report. So we’ve built with KERL Erlang 17.3 from sources and situation hadn’t change.
>
> Here is GBD session we did retrieve from core file along with detailed information about application, build procedure, etc.:
>
>           1. https://gist.github.com/5HT/e35d58b76bc25680e17b
>           2. https://gist.github.com/5HT/224c569df807f1e337aa
#1 Seems to be corruption in process heap memory, found by gc.
#2 Seems to be corruption in ets memory, found by ets:insert.

> We heard that 17.3 have some unstable memory allocators.

Not sure what allocator problem you are referring to. There is one known 
race bug in the carrier migration logic for 17. It also exist in R16 but 
only if you run with carrier migration enabled (+Muacul with a value 
other than zero). This bug is fixed in tag OTP-17.3.4 at github and will 
be released in 17.4


> But crashes was also reproducible on R16B02. So we decided not to panic and ask in community the recipe how to perform further checks and plan in calm the regression test kit.
>
> As you can see crash core files contains information about allocators and gc. We think the problem is there. That leads us with following questions to community:
>
>           1. Which memory allocators you suggest us to try at first?
A gdb backtrace including allocator calls does not mean that the 
allocator code is to blame. Very often the allocator just happens to be 
the one that discovers a corruption that has already happened. The same 
goes for the GC.
So I do not see much use of trying different allocators other than maybe 
to find a workaround.

>           2. What other steps we should perform?
You could try run with a debug compiled beam.smp. It will be much slower 
but may catch the bug earlier and give a much nicer core dump.

$> cd $ERL_TOP/erts/emulator
$> make TYPE=debug smp

run with

$> $ERL_TOP/bin/cerl -debug

OR replace your beam.smp and copy child_setup.debug to the same place:

$> cp <erts-install-dir>/bin/beam.smp <erts-install-dir>/bin/beam.smp.saved
$> cp $ERL_TOP/bin/<target>/beam.debug.smp <erts-install-dir>/bin/beam.smp
$> cp $ERL_TOP/bin/<target>/child_setup.debug <erts-install-dir>/bin/

and run as you normally do


And lastly you can send us the source and a description of how to 
reproduce including info of hardware and OS.


/Sverker, Erlang/OTP Ericsson





More information about the erlang-bugs mailing list