[erlang-questions] segfault erts-5.10.4 (R16B03-1)

Tue Sep 8 16:20:17 CEST 2015

Hi Lukas,
Is it possible to use the etp-commands on a core file or gdb has to be
attached to a live node?

Best Regards,
- Ahmed Omar
http://about.me/spawn.think/

2015-09-08 14:54 GMT+02:00 Ahmed Omar <spawn.think@REDACTED>:

> Hi Lukas,
>
> Thanks a lot for the pointers, I'll see if I can find anything.
>
> Best Regards,
> - Ahmed Omar
> http://about.me/spawn.think/
>
> 2015-09-08 12:11 GMT+02:00 Lukas Larsson <garazdawi@REDACTED>:
>
>> Hello,
>>
>> Unfortunately that did not help, it just made some more arguments
>> available. I was hoping that it would give a full stack. If you do "p
>> allctr->name_prefix" you can get to know which allocator it is that is miss
>> behaving. I'm guessing that it will be "driver_", which is any allocations
>> done in nifs+linked-in drivers. Without a full stacktrace it will be hard
>> to figure out what is wrong.
>>
>> One thing that you could do it to use the etp gdb macros distributed with
>> the Erlang/OTP source code. If you in gdb do "source
>> $ERL_TOP/erts/etc/unix/etp-commands" (replacing $ERL_TOP with the path to
>> the source of R16B03-1 Erlang/OTP from github. It is only this file that is
>> needed, so if you need to copy this onto a server somewhere you only need
>> this file) you will get access to a lot of helpful gdb macros.
>>
>> If you then do "etp-ports" you will get printed to the shell all ports
>> that are alive at the moment of the crash. Look for any ports with a state
>> that looks different. That will most likely be the port that is just
>> executing. e.g. for me this is a currently running port:
>>
>>   Pix: 2576
>>   Port: #Port<0.322>
>>   Name: tty_sl -c -e
>>   State: connected soft-eof
>>   Scheduler flags: GARBAGE
>>   Connected: <0.25.0>
>>   Pointer: (Port *) 0x7ffff54809d8
>>
>> to get the name of the currently running driver do "p ((Port
>> *)0x7ffff54809d8)->drv_ptr->name".
>>
>> If no port is executing, it might be a nif, then you can do
>> "etp-processes" to get a list of all processes in the system. Again look
>> for any state that looks different. e.g.
>>
>>   Pix: 200
>>   Pid: <0.25.0>
>>   State: trapping-exit | running | active | prq-prio-normal |
>> usr-prio-normal | act-prio-normal
>>   Registered name: user_drv
>>   I: #Cp<user_drv:io_command/1+0x520>
>>   Heap size: 610
>>   Old-heap size: 987
>>   Mbuf size: 0
>>   Msgq len: 0 (inner=0, outer=0)
>>   Parent: <0.24.0>
>>   Pointer: (Process *) 0x7ffff51c4df0
>>
>> This is a currently running process (State: running). You can get a
>> stackdump of the process by doing: etp-stackdump ((Process*)0x7ffff51c4df0)
>>
>> Note that etp-processes and etp-ports will take quite some time to run
>> until they finish. They need to iterate over all possibly processes/ports,
>> and gdb is not the fastest scripting language in the world.
>>
>> For some help with the etp gdb commands you can issue "etp-help".
>>
>> Happy hunting!
>> Lukas
>>
>> On Tue, Sep 8, 2015 at 11:45 AM, Ahmed Omar <spawn.think@REDACTED>
>> wrote:
>>
>>> Hi Lukas,
>>> Thanks for your reply. I tried with the latest version of gdb (7.10) :
>>>
>>> ###
>>> (gdb) bt full
>>> #0  0x000000000044d299 in link_free_block (allctr=0x15e32c0,
>>> block=0x128) at beam/erl_goodfit_alloc.c:439
>>>         gfallctr = 0x15e32c0
>>>         blk = 0x128
>>>         sz = 0
>>>         i = <optimized out>
>>> #1  0x00000000015e32c0 in ?? ()
>>> No symbol table info available.
>>> #2  0x0000000000442fa6 in mbc_realloc (allctr=0x7fe0848807a8, p=0x11f,
>>> size=<optimized out>, busy_pcrr_pp=0x8, alcu_flgs=0) at
>>> beam/erl_alloc_util.c:2370
>>>         crr = 0x128
>>>         new_p = <optimized out>
>>>         old_blk_sz = 287
>>>         blk = 0x117
>>>         new_blk = <optimized out>
>>>         cand_blk = <optimized out>
>>>         cand_blk_sz = <optimized out>
>>>         blk_sz = 3748409
>>>         nxt_blk = 0x236
>>>         nxt_blk_sz = 22950592
>>>         is_last_blk = 296
>>>         get_blk_sz = 140602277246336
>>> #3  0x0000000000000000 in ?? ()
>>> No symbol table info available.
>>> ###
>>>
>>> Best Regards,
>>> - Ahmed Omar
>>> http://about.me/spawn.think/
>>>
>>> 2015-09-08 10:51 GMT+02:00 Lukas Larsson <garazdawi@REDACTED>:
>>>
>>>> Hello,
>>>>
>>>> On Tue, Sep 8, 2015 at 10:33 AM, Ahmed Omar <spawn.think@REDACTED>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> We have been experiencing a segfault on our servers running a custom
>>>>> version of Ejabberd. We managed to get a core file from the last crash
>>>>> This is what we see running gdb on it:
>>>>> ######
>>>>> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
>>>>> symbols found)...done.
>>>>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>>>>> Core was generated by `/var/lib/ejabberd/erts-5.10.4/bin/beam.smp -K
>>>>> true -A 128 -P 2500000 -Q 500000'.
>>>>> Program terminated with signal 11, Segmentation fault.
>>>>> #0  0x000000000044d299 in link_free_block (allctr=0x15e32c0,
>>>>> block=0x128) at beam/erl_goodfit_alloc.c:439
>>>>> 439 beam/erl_goodfit_alloc.c: No such file or directory.
>>>>> in beam/erl_goodfit_alloc.c
>>>>> ######
>>>>>
>>>>> If we run bt full in gdb we get:
>>>>> ######
>>>>> (gdb) bt full
>>>>> #0  0x000000000044d299 in link_free_block (allctr=0x15e32c0,
>>>>> block=0x128) at beam/erl_goodfit_alloc.c:439
>>>>>         gfallctr = 0x15e32c0
>>>>>         blk = 0x128
>>>>>         sz = 0
>>>>>         i = <value optimized out>
>>>>> #1  0x00000000015e32c0 in ?? ()
>>>>> No symbol table info available.
>>>>> #2  0x0000000000442fa6 in mbc_realloc (allctr=0x7fe0848807a8, p=0x11f,
>>>>> size=Unhandled dwarf expression opcode 0xf3
>>>>> ) at beam/erl_alloc_util.c:2370
>>>>>         crr = 0x128
>>>>>         new_p = <value optimized out>
>>>>>         old_blk_sz = 287
>>>>>         blk = 0x117
>>>>>         new_blk = <value optimized out>
>>>>>         cand_blk = <value optimized out>
>>>>>         cand_blk_sz = <value optimized out>
>>>>>         blk_sz = 3748409
>>>>>         nxt_blk = 0x236
>>>>>         nxt_blk_sz = 22950592
>>>>>         is_last_blk = 296
>>>>>         get_blk_sz = 140602277246336
>>>>> #3  0x0000000000000000 in ?? ()
>>>>> No symbol table info available.
>>>>> #######
>>>>>
>>>>> Is there a way to get more information? maybe which driver made the
>>>>> realloc call?
>>>>>
>>>>
>>>> Something is wrong/missing from this stacktrace. The gdb that you are
>>>> using does not seem to understand the dwarf2 extension (at least that's
>>>> what I guess after googling "Unhandled dwarf expression opcode 0xf3"), and
>>>> can only find two of the frames. Try to install a later version of gdb and
>>>> then do a bt full.
>>>>
>>>>
>>>>>
>>>>> Best Regards,
>>>>> - Ahmed Omar
>>>>> http://about.me/spawn.think/
>>>>>
>>>>> _______________________________________________
>>>>> erlang-questions mailing list
>>>>> erlang-questions@REDACTED
>>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150908/7c77bbb8/attachment.htm>