[erlang-questions] segfault erts-5.10.4 (R16B03-1)

Ahmed Omar spawn.think@REDACTED
Tue Sep 8 14:54:11 CEST 2015


Hi Lukas,

Thanks a lot for the pointers, I'll see if I can find anything.

Best Regards,
- Ahmed Omar
http://about.me/spawn.think/

2015-09-08 12:11 GMT+02:00 Lukas Larsson <garazdawi@REDACTED>:

> Hello,
>
> Unfortunately that did not help, it just made some more arguments
> available. I was hoping that it would give a full stack. If you do "p
> allctr->name_prefix" you can get to know which allocator it is that is miss
> behaving. I'm guessing that it will be "driver_", which is any allocations
> done in nifs+linked-in drivers. Without a full stacktrace it will be hard
> to figure out what is wrong.
>
> One thing that you could do it to use the etp gdb macros distributed with
> the Erlang/OTP source code. If you in gdb do "source
> $ERL_TOP/erts/etc/unix/etp-commands" (replacing $ERL_TOP with the path to
> the source of R16B03-1 Erlang/OTP from github. It is only this file that is
> needed, so if you need to copy this onto a server somewhere you only need
> this file) you will get access to a lot of helpful gdb macros.
>
> If you then do "etp-ports" you will get printed to the shell all ports
> that are alive at the moment of the crash. Look for any ports with a state
> that looks different. That will most likely be the port that is just
> executing. e.g. for me this is a currently running port:
>
>   Pix: 2576
>   Port: #Port<0.322>
>   Name: tty_sl -c -e
>   State: connected soft-eof
>   Scheduler flags: GARBAGE
>   Connected: <0.25.0>
>   Pointer: (Port *) 0x7ffff54809d8
>
> to get the name of the currently running driver do "p ((Port
> *)0x7ffff54809d8)->drv_ptr->name".
>
> If no port is executing, it might be a nif, then you can do
> "etp-processes" to get a list of all processes in the system. Again look
> for any state that looks different. e.g.
>
>   Pix: 200
>   Pid: <0.25.0>
>   State: trapping-exit | running | active | prq-prio-normal |
> usr-prio-normal | act-prio-normal
>   Registered name: user_drv
>   I: #Cp<user_drv:io_command/1+0x520>
>   Heap size: 610
>   Old-heap size: 987
>   Mbuf size: 0
>   Msgq len: 0 (inner=0, outer=0)
>   Parent: <0.24.0>
>   Pointer: (Process *) 0x7ffff51c4df0
>
> This is a currently running process (State: running). You can get a
> stackdump of the process by doing: etp-stackdump ((Process*)0x7ffff51c4df0)
>
> Note that etp-processes and etp-ports will take quite some time to run
> until they finish. They need to iterate over all possibly processes/ports,
> and gdb is not the fastest scripting language in the world.
>
> For some help with the etp gdb commands you can issue "etp-help".
>
> Happy hunting!
> Lukas
>
> On Tue, Sep 8, 2015 at 11:45 AM, Ahmed Omar <spawn.think@REDACTED> wrote:
>
>> Hi Lukas,
>> Thanks for your reply. I tried with the latest version of gdb (7.10) :
>>
>> ###
>> (gdb) bt full
>> #0  0x000000000044d299 in link_free_block (allctr=0x15e32c0, block=0x128)
>> at beam/erl_goodfit_alloc.c:439
>>         gfallctr = 0x15e32c0
>>         blk = 0x128
>>         sz = 0
>>         i = <optimized out>
>> #1  0x00000000015e32c0 in ?? ()
>> No symbol table info available.
>> #2  0x0000000000442fa6 in mbc_realloc (allctr=0x7fe0848807a8, p=0x11f,
>> size=<optimized out>, busy_pcrr_pp=0x8, alcu_flgs=0) at
>> beam/erl_alloc_util.c:2370
>>         crr = 0x128
>>         new_p = <optimized out>
>>         old_blk_sz = 287
>>         blk = 0x117
>>         new_blk = <optimized out>
>>         cand_blk = <optimized out>
>>         cand_blk_sz = <optimized out>
>>         blk_sz = 3748409
>>         nxt_blk = 0x236
>>         nxt_blk_sz = 22950592
>>         is_last_blk = 296
>>         get_blk_sz = 140602277246336
>> #3  0x0000000000000000 in ?? ()
>> No symbol table info available.
>> ###
>>
>> Best Regards,
>> - Ahmed Omar
>> http://about.me/spawn.think/
>>
>> 2015-09-08 10:51 GMT+02:00 Lukas Larsson <garazdawi@REDACTED>:
>>
>>> Hello,
>>>
>>> On Tue, Sep 8, 2015 at 10:33 AM, Ahmed Omar <spawn.think@REDACTED>
>>> wrote:
>>>
>>>> Hi,
>>>> We have been experiencing a segfault on our servers running a custom
>>>> version of Ejabberd. We managed to get a core file from the last crash
>>>> This is what we see running gdb on it:
>>>> ######
>>>> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
>>>> symbols found)...done.
>>>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>>>> Core was generated by `/var/lib/ejabberd/erts-5.10.4/bin/beam.smp -K
>>>> true -A 128 -P 2500000 -Q 500000'.
>>>> Program terminated with signal 11, Segmentation fault.
>>>> #0  0x000000000044d299 in link_free_block (allctr=0x15e32c0,
>>>> block=0x128) at beam/erl_goodfit_alloc.c:439
>>>> 439 beam/erl_goodfit_alloc.c: No such file or directory.
>>>> in beam/erl_goodfit_alloc.c
>>>> ######
>>>>
>>>> If we run bt full in gdb we get:
>>>> ######
>>>> (gdb) bt full
>>>> #0  0x000000000044d299 in link_free_block (allctr=0x15e32c0,
>>>> block=0x128) at beam/erl_goodfit_alloc.c:439
>>>>         gfallctr = 0x15e32c0
>>>>         blk = 0x128
>>>>         sz = 0
>>>>         i = <value optimized out>
>>>> #1  0x00000000015e32c0 in ?? ()
>>>> No symbol table info available.
>>>> #2  0x0000000000442fa6 in mbc_realloc (allctr=0x7fe0848807a8, p=0x11f,
>>>> size=Unhandled dwarf expression opcode 0xf3
>>>> ) at beam/erl_alloc_util.c:2370
>>>>         crr = 0x128
>>>>         new_p = <value optimized out>
>>>>         old_blk_sz = 287
>>>>         blk = 0x117
>>>>         new_blk = <value optimized out>
>>>>         cand_blk = <value optimized out>
>>>>         cand_blk_sz = <value optimized out>
>>>>         blk_sz = 3748409
>>>>         nxt_blk = 0x236
>>>>         nxt_blk_sz = 22950592
>>>>         is_last_blk = 296
>>>>         get_blk_sz = 140602277246336
>>>> #3  0x0000000000000000 in ?? ()
>>>> No symbol table info available.
>>>> #######
>>>>
>>>> Is there a way to get more information? maybe which driver made the
>>>> realloc call?
>>>>
>>>
>>> Something is wrong/missing from this stacktrace. The gdb that you are
>>> using does not seem to understand the dwarf2 extension (at least that's
>>> what I guess after googling "Unhandled dwarf expression opcode 0xf3"), and
>>> can only find two of the frames. Try to install a later version of gdb and
>>> then do a bt full.
>>>
>>>
>>>>
>>>> Best Regards,
>>>> - Ahmed Omar
>>>> http://about.me/spawn.think/
>>>>
>>>> _______________________________________________
>>>> erlang-questions mailing list
>>>> erlang-questions@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150908/da57e458/attachment.htm>


More information about the erlang-questions mailing list