[erlang-questions] beam[8449]: segfault at 0 ip 0000000000437e10 sp 00007fffce250948 error 4 in beam[400000+174000]

Eric Liang eric.l.2046@REDACTED
Sat Jun 5 11:39:17 CEST 2010


On 05/27/2010 02:14 AM, Mikael Pettersson wrote:
> Eric Liang wrote:
>   
>> I've done a build of the source, but it just can't match the object. How
>> do you make it? I use the command: apt-get source to get the source, so
>> it does have the same version with the object.
>>     
> I did:
>
>   
>> tar zxvf otp_src_R13B03.tar.gz
>> cd otp_src_R13B03
>> ./configure; make
>>     
> The binary files of interest are bin/x86_64-unknown-linux-gnu/beam and 
> erts/emulator/obj/x86_64-unknown-linux-gnu/opt/plain/erl_goodfit_alloc.o.
>
>   
Thanks Mikael, and sorry for replying you too late as the seg-fault is
not occured every time.

I get the debug symbols by this:

    http://forum.nginx.org/read.php?26,93440,94735

>>> You can get a stack dump from the crash by attaching gdb to the
>>> soon-to-crash beam process. Now instead of being terminated gdb will
>>> get control of the process and you should be able to print a stack
>>> trace with bt or where. (This does require that there's a sufficient
>>> time window from the start of the application to the crash.)
>>>  =20
>>>       
>> I've make a core dump 4 seconds before it crash,  as mentioned above,=20
>> because don't get the right symbols, it just with some quesion-marks:
>>
>>     Core was generated by `/usr/lib/erlang/erts-5.7.2/bin/beam'.
>>     #0  0x00007f0a28ecd5a9 in ?? ()
>>     (gdb) whe
>>     #0  0x00007f0a28ecd5a9 in ?? ()
>>     #1  0x0000000000000000 in ?? ()
>>     (gdb)
>>     
> A core dump from a time point before the crash is useless. Either get a
> core dump from the crash itself (execute `ulimit -c unlimited' in bash
> before running the test), or attach gdb, continue the process, and wait
> for gdb to receive control when the crash occurs.
>   

I do set the ulimit -c in /etc/profile and after I reboot it:

    sunny@REDACTED:~$ ulimit -c
    unlimited
    sunny@REDACTED:~/commands$ cat /proc/sys/kernel/core_pattern
    /tmp/core.%t.%e.%p

And I the test is ok:

    sunny@REDACTED:~$ kill -s SIGSEGV $$
    Connection to dev-2 closed.
    sunny@REDACTED:~$ ls /tmp/
    core.1275730620.bash.12566

But still no core file generated,when the error occurs.

Anyway, I attatched the running process by gdb, and here is the result:

    Program received signal SIGSEGV, Segmentation fault.
    unlink_free_block (allctr=0x7ad480, block=0x0) at
    beam/erl_goodfit_alloc.c:453
    453        Uint sz = BLK_SZ(blk);
    (gdb) whe
    #0  unlink_free_block (allctr=0x7ad480, block=0x0) at
    beam/erl_goodfit_alloc.c:453
    #1  0x0000000000437fd6 in get_free_block (allctr=0x7ad480,
    size=<value optimized out>, cand_blk=0x0, cand_size=0)
        at beam/erl_goodfit_alloc.c:421
    #2  0x00000000004322c6 in mbc_alloc_block (allctr=0x7ad480, size=72)
    at beam/erl_alloc_util.c:631
    #3  mbc_alloc (allctr=0x7ad480, size=72) at beam/erl_alloc_util.c:758
    #4  0x00000000004b1697 in erts_alloc () at beam/erl_alloc.h:179
    #5  exit_async () at beam/erl_async.c:132
    #6  0x000000000043c13d in system_cleanup (exit_code=<value optimized
    out>) at beam/erl_init.c:1306
    #7  0x000000000043c443 in erl_exit (n=0, fmt=0x54649c "") at
    beam/erl_init.c:1380
    #8  0x000000000045d042 in halt_0 (A__p=<value optimized out>) at
    beam/bif.c:3319
    #9  0x00000000004d081f in process_main () at beam/beam_emu.c:2008
    #10 0x000000000043d56c in erl_start (argc=34, argv=<value optimized
    out>) at beam/erl_init.c:1233
    #11 0x00000000004269b9 in main (argc=8049792, argv=0x0) at
    sys/unix/erl_main.c:29
    (gdb) f 1
    #1  0x0000000000437fd6 in get_free_block (allctr=0x7ad480,
    size=<value optimized out>, cand_blk=0x0, cand_size=0)
        at beam/erl_goodfit_alloc.c:421
    421        unlink_free_block(allctr, blk);
    (gdb) l 421
    416        /* We are guaranteed to find a block that fits in this
    bucket */
    417        blk = search_bucket(allctr, min_bi, size);
    418        ASSERT(blk);
    419        if (cand_blk && cand_size <= BLK_SZ(blk))
    420        return NULL; /* cand_blk was better */
    421        unlink_free_block(allctr, blk);
    422        return blk;
    423    }
    424   
    425   
    (gdb)

As the running process use the no-debug symbol version beam, I guess the
ASSERT in line:418 does not work. So I dig in

    (gdb) p allctr
    $1 = (Allctr_t *) 0x7ad480
    (gdb) p min_bi
    $2 = <value optimized out>
    (gdb) p size
    $3 = <value optimized out>
    (gdb) p *allctr
    $4 = {name_prefix = 0x534227 "sl_", alloc_no = 3, name = {alloc = 0,
    realloc = 0, free = 0},
      vsn_str = 0x53602f "2.1", t = 0, ramv = 0, sbc_threshold = 524288,
    sbc_move_threshold = 80,
      mbc_move_threshold = 50, main_carrier_size = 131072, max_mseg_sbcs
    = 256, max_mseg_mbcs = 5,
      largest_mbc_size = 10485760, smallest_mbc_size = 1048576,
    mbc_growth_stages = 10, mseg_opt = {cache = 1,
        preserv = 1, abs_shrink_th = 4145152, rel_shrink_th = 80},
    mbc_header_size = 32, sbc_header_size = 32,
      min_mbc_size = 16384, min_mbc_first_free_size = 4096,
    min_block_size = 32, mbc_list = {first = 0x7f4f93a5d010,
        last = 0x7f4f93a5d010}, sbc_list = {first = 0x0, last = 0x0},
    main_carrier = 0x7f4f93a5d010,
      get_free_block = 0x437f40 <get_free_block>, link_free_block =
    0x437d00 <link_free_block>,
      unlink_free_block = 0x437e10 <unlink_free_block>, info_options =
    0x438480 <info_options>,
      get_next_mbc_size = 0x430e40 <get_next_mbc_size>, creating_mbc =
    0x438100 <update_last_aux_mbc>,
      destroying_mbc = 0x438100 <update_last_aux_mbc>, init_atoms =
    0x4385c0 <init_atoms>, mutex = {mtx = {pt_mtx = {
            __data = {__lock = 0, __count = 0, __owner = 0, __nusers =
    0, __kind = 0, __spins = 0, __list = {
                __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats
    39 times>, __align = 0}, is_rec_mtx = 0,
          prev = 0x0, next = 0x0}}, thread_safe = 0, ts_list = {prev =
    0x0, next = 0x0}, atoms_initialized = 0,
      stopped = 0, calls = {this_alloc = {giga_no = 0, no = 2460},
    this_free = {giga_no = 0, no = 2458},
        this_realloc = {giga_no = 0, no = 0}, mseg_alloc = {giga_no = 0,
    no = 0}, mseg_dealloc = {giga_no = 0, no = 0},
        mseg_realloc = {giga_no = 0, no = 0}, sys_alloc = {giga_no = 0,
    no = 1}, sys_free = {giga_no = 0, no = 0},
        sys_realloc = {giga_no = 0, no = 0}}, sbcs = {curr_mseg = {no =
    0, size = 0}, curr_sys_alloc = {no = 0,
          size = 0}, max = {no = 0, size = 0}, max_ever = {no = 0, size
    = 0}, blocks = {curr = {no = 0, size = 0},
          max = {no = 0, size = 0}, max_ever = {no = 0, size = 0}}},
    mbcs = {curr_mseg = {no = 0, size = 0},
        curr_sys_alloc = {no = 1, size = 131112}, max = {no = 1, size =
    131112}, max_ever = {no = 0, size = 0},
        blocks = {curr = {no = 4, size = 384}, max = {no = 144, size =
    13848}, max_ever = {no = 0, size = 0}}}}
    (gdb)

And stalled here, do you have any advices? and also, any other
suggestions would be appreciated. TIA.

>   
>> Look back the initial error:
>>
>>     segfault at 0 ip 0000000000437e10 sp 00007fffce250948 error 4 in
>>     beam[400000+174000]
>>
>> Would you mind tell me what's the meaning of -ip-/-sp-, and what does
>> -error 4- means?
>>     
> IP is the Instruction Pointer aka Program Counter. SP is the stack pointer.
> The 'error 4' is a low-level error code which is irrelevant for us.
>   
Got it, thank you.

Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20100605/edd87af4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20100605/edd87af4/attachment.bin>


More information about the erlang-questions mailing list