[erlang-questions] segfault erts-5.10.4 (R16B03-1)
Lukas Larsson
garazdawi@REDACTED
Tue Sep 8 12:11:38 CEST 2015
Hello,
Unfortunately that did not help, it just made some more arguments
available. I was hoping that it would give a full stack. If you do "p
allctr->name_prefix" you can get to know which allocator it is that is miss
behaving. I'm guessing that it will be "driver_", which is any allocations
done in nifs+linked-in drivers. Without a full stacktrace it will be hard
to figure out what is wrong.
One thing that you could do it to use the etp gdb macros distributed with
the Erlang/OTP source code. If you in gdb do "source
$ERL_TOP/erts/etc/unix/etp-commands" (replacing $ERL_TOP with the path to
the source of R16B03-1 Erlang/OTP from github. It is only this file that is
needed, so if you need to copy this onto a server somewhere you only need
this file) you will get access to a lot of helpful gdb macros.
If you then do "etp-ports" you will get printed to the shell all ports that
are alive at the moment of the crash. Look for any ports with a state that
looks different. That will most likely be the port that is just executing.
e.g. for me this is a currently running port:
Pix: 2576
Port: #Port<0.322>
Name: tty_sl -c -e
State: connected soft-eof
Scheduler flags: GARBAGE
Connected: <0.25.0>
Pointer: (Port *) 0x7ffff54809d8
to get the name of the currently running driver do "p ((Port
*)0x7ffff54809d8)->drv_ptr->name".
If no port is executing, it might be a nif, then you can do "etp-processes"
to get a list of all processes in the system. Again look for any state that
looks different. e.g.
Pix: 200
Pid: <0.25.0>
State: trapping-exit | running | active | prq-prio-normal |
usr-prio-normal | act-prio-normal
Registered name: user_drv
I: #Cp<user_drv:io_command/1+0x520>
Heap size: 610
Old-heap size: 987
Mbuf size: 0
Msgq len: 0 (inner=0, outer=0)
Parent: <0.24.0>
Pointer: (Process *) 0x7ffff51c4df0
This is a currently running process (State: running). You can get a
stackdump of the process by doing: etp-stackdump ((Process*)0x7ffff51c4df0)
Note that etp-processes and etp-ports will take quite some time to run
until they finish. They need to iterate over all possibly processes/ports,
and gdb is not the fastest scripting language in the world.
For some help with the etp gdb commands you can issue "etp-help".
Happy hunting!
Lukas
On Tue, Sep 8, 2015 at 11:45 AM, Ahmed Omar <spawn.think@REDACTED> wrote:
> Hi Lukas,
> Thanks for your reply. I tried with the latest version of gdb (7.10) :
>
> ###
> (gdb) bt full
> #0 0x000000000044d299 in link_free_block (allctr=0x15e32c0, block=0x128)
> at beam/erl_goodfit_alloc.c:439
> gfallctr = 0x15e32c0
> blk = 0x128
> sz = 0
> i = <optimized out>
> #1 0x00000000015e32c0 in ?? ()
> No symbol table info available.
> #2 0x0000000000442fa6 in mbc_realloc (allctr=0x7fe0848807a8, p=0x11f,
> size=<optimized out>, busy_pcrr_pp=0x8, alcu_flgs=0) at
> beam/erl_alloc_util.c:2370
> crr = 0x128
> new_p = <optimized out>
> old_blk_sz = 287
> blk = 0x117
> new_blk = <optimized out>
> cand_blk = <optimized out>
> cand_blk_sz = <optimized out>
> blk_sz = 3748409
> nxt_blk = 0x236
> nxt_blk_sz = 22950592
> is_last_blk = 296
> get_blk_sz = 140602277246336
> #3 0x0000000000000000 in ?? ()
> No symbol table info available.
> ###
>
> Best Regards,
> - Ahmed Omar
> http://about.me/spawn.think/
>
> 2015-09-08 10:51 GMT+02:00 Lukas Larsson <garazdawi@REDACTED>:
>
>> Hello,
>>
>> On Tue, Sep 8, 2015 at 10:33 AM, Ahmed Omar <spawn.think@REDACTED>
>> wrote:
>>
>>> Hi,
>>> We have been experiencing a segfault on our servers running a custom
>>> version of Ejabberd. We managed to get a core file from the last crash
>>> This is what we see running gdb on it:
>>> ######
>>> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>>> Core was generated by `/var/lib/ejabberd/erts-5.10.4/bin/beam.smp -K
>>> true -A 128 -P 2500000 -Q 500000'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0 0x000000000044d299 in link_free_block (allctr=0x15e32c0,
>>> block=0x128) at beam/erl_goodfit_alloc.c:439
>>> 439 beam/erl_goodfit_alloc.c: No such file or directory.
>>> in beam/erl_goodfit_alloc.c
>>> ######
>>>
>>> If we run bt full in gdb we get:
>>> ######
>>> (gdb) bt full
>>> #0 0x000000000044d299 in link_free_block (allctr=0x15e32c0,
>>> block=0x128) at beam/erl_goodfit_alloc.c:439
>>> gfallctr = 0x15e32c0
>>> blk = 0x128
>>> sz = 0
>>> i = <value optimized out>
>>> #1 0x00000000015e32c0 in ?? ()
>>> No symbol table info available.
>>> #2 0x0000000000442fa6 in mbc_realloc (allctr=0x7fe0848807a8, p=0x11f,
>>> size=Unhandled dwarf expression opcode 0xf3
>>> ) at beam/erl_alloc_util.c:2370
>>> crr = 0x128
>>> new_p = <value optimized out>
>>> old_blk_sz = 287
>>> blk = 0x117
>>> new_blk = <value optimized out>
>>> cand_blk = <value optimized out>
>>> cand_blk_sz = <value optimized out>
>>> blk_sz = 3748409
>>> nxt_blk = 0x236
>>> nxt_blk_sz = 22950592
>>> is_last_blk = 296
>>> get_blk_sz = 140602277246336
>>> #3 0x0000000000000000 in ?? ()
>>> No symbol table info available.
>>> #######
>>>
>>> Is there a way to get more information? maybe which driver made the
>>> realloc call?
>>>
>>
>> Something is wrong/missing from this stacktrace. The gdb that you are
>> using does not seem to understand the dwarf2 extension (at least that's
>> what I guess after googling "Unhandled dwarf expression opcode 0xf3"), and
>> can only find two of the frames. Try to install a later version of gdb and
>> then do a bt full.
>>
>>
>>>
>>> Best Regards,
>>> - Ahmed Omar
>>> http://about.me/spawn.think/
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150908/1ecd97b6/attachment.htm>
More information about the erlang-questions
mailing list