[erlang-questions] beam[8449]: segfault at 0 ip 0000000000437e10 sp 00007fffce250948 error 4 in beam[400000+174000]

Eric Liang eric.l.2046@REDACTED
Sun Jun 13 05:11:52 CEST 2010


On 06/12/2010 05:45 PM, Mikael Pettersson wrote:
> Eric Liang wrote:
>   
>> On 05/27/2010 02:14 AM, Mikael Pettersson wrote:
>>     
>>> Eric Liang wrote:
>>>  =20
>>>       
>>>> I've done a build of the source, but it just can't match the object. H=
>>>>         
>> ow
>>     
>>>> do you make it? I use the command: apt-get source to get the source, s=
>>>>         
>> o
>>     
>>>> it does have the same version with the object.
>>>>    =20
>>>>         
>>> I did:
>>>
>>>  =20
>>>       
>>>> tar zxvf otp_src_R13B03.tar.gz
>>>> cd otp_src_R13B03
>>>> ./configure; make
>>>>    =20
>>>>         
>>> The binary files of interest are bin/x86_64-unknown-linux-gnu/beam and =
>>>       
>>     
>>> erts/emulator/obj/x86_64-unknown-linux-gnu/opt/plain/erl_goodfit_alloc.=
>>>       
>> o.
>>     
>>>  =20
>>>       
>> Thanks Mikael, and sorry for replying you too late as the seg-fault is
>> not occured every time.
>>
>> I get the debug symbols by this:
>>
>>     http://forum.nginx.org/read.php?26,93440,94735
>>
>>     
>>>>> You can get a stack dump from the crash by attaching gdb to the
>>>>> soon-to-crash beam process. Now instead of being terminated gdb will
>>>>> get control of the process and you should be able to print a stack
>>>>> trace with bt or where. (This does require that there's a sufficient
>>>>> time window from the start of the application to the crash.)
>>>>>  =3D20
>>>>>      =20
>>>>>           
>>>> I've make a core dump 4 seconds before it crash,  as mentioned above,=3D=
>>>>         
>> 20
>>     
>>>> because don't get the right symbols, it just with some quesion-marks:
>>>>
>>>>     Core was generated by `/usr/lib/erlang/erts-5.7.2/bin/beam'.
>>>>     #0  0x00007f0a28ecd5a9 in ?? ()
>>>>     (gdb) whe
>>>>     #0  0x00007f0a28ecd5a9 in ?? ()
>>>>     #1  0x0000000000000000 in ?? ()
>>>>     (gdb)
>>>>    =20
>>>>         
>>> A core dump from a time point before the crash is useless. Either get a=
>>>       
>>     
>>> core dump from the crash itself (execute `ulimit -c unlimited' in bash
>>> before running the test), or attach gdb, continue the process, and wait=
>>>       
>>     
>>> for gdb to receive control when the crash occurs.
>>>  =20
>>>       
>> I do set the ulimit -c in /etc/profile and after I reboot it:
>>
>>     sunny@REDACTED:~$ ulimit -c
>>     unlimited
>>     sunny@REDACTED:~/commands$ cat /proc/sys/kernel/core_pattern
>>     /tmp/core.%t.%e.%p
>>
>> And I the test is ok:
>>
>>     sunny@REDACTED:~$ kill -s SIGSEGV $$
>>     Connection to dev-2 closed.
>>     sunny@REDACTED:~$ ls /tmp/
>>     core.1275730620.bash.12566
>>
>> But still no core file generated,when the error occurs.
>>
>> Anyway, I attatched the running process by gdb, and here is the result:
>>
>>     Program received signal SIGSEGV, Segmentation fault.
>>     unlink_free_block (allctr=3D0x7ad480, block=3D0x0) at
>>     beam/erl_goodfit_alloc.c:453
>>     453        Uint sz =3D BLK_SZ(blk);
>>     (gdb) whe
>>     #0  unlink_free_block (allctr=3D0x7ad480, block=3D0x0) at
>>     beam/erl_goodfit_alloc.c:453
>>     #1  0x0000000000437fd6 in get_free_block (allctr=3D0x7ad480,
>>     size=3D<value optimized out>, cand_blk=3D0x0, cand_size=3D0)
>>         at beam/erl_goodfit_alloc.c:421
>>     #2  0x00000000004322c6 in mbc_alloc_block (allctr=3D0x7ad480, size=3D=
>> 72)
>>     at beam/erl_alloc_util.c:631
>>     #3  mbc_alloc (allctr=3D0x7ad480, size=3D72) at beam/erl_alloc_util.c=
>> :758
>>     #4  0x00000000004b1697 in erts_alloc () at beam/erl_alloc.h:179
>>     #5  exit_async () at beam/erl_async.c:132
>>     #6  0x000000000043c13d in system_cleanup (exit_code=3D<value optimize=
>> d
>>     out>) at beam/erl_init.c:1306
>>     #7  0x000000000043c443 in erl_exit (n=3D0, fmt=3D0x54649c "") at
>>     beam/erl_init.c:1380
>>     #8  0x000000000045d042 in halt_0 (A__p=3D<value optimized out>) at
>>     beam/bif.c:3319
>>     #9  0x00000000004d081f in process_main () at beam/beam_emu.c:2008
>>     #10 0x000000000043d56c in erl_start (argc=3D34, argv=3D<value optimiz=
>> ed
>>     out>) at beam/erl_init.c:1233
>>     #11 0x00000000004269b9 in main (argc=3D8049792, argv=3D0x0) at
>>     sys/unix/erl_main.c:29
>>     (gdb) f 1
>>     #1  0x0000000000437fd6 in get_free_block (allctr=3D0x7ad480,
>>     size=3D<value optimized out>, cand_blk=3D0x0, cand_size=3D0)
>>         at beam/erl_goodfit_alloc.c:421
>>     421        unlink_free_block(allctr, blk);
>>     (gdb) l 421
>>     416        /* We are guaranteed to find a block that fits in this
>>     bucket */
>>     417        blk =3D search_bucket(allctr, min_bi, size);
>>     418        ASSERT(blk);
>>     419        if (cand_blk && cand_size <=3D BLK_SZ(blk))
>>     420        return NULL; /* cand_blk was better */
>>     421        unlink_free_block(allctr, blk);
>>     422        return blk;
>>     423    }
>>     424  =20
>>     425  =20
>>     (gdb)
>>
>> As the running process use the no-debug symbol version beam, I guess the
>> ASSERT in line:418 does not work. So I dig in
>>
>>     (gdb) p allctr
>>     $1 =3D (Allctr_t *) 0x7ad480
>>     (gdb) p min_bi
>>     $2 =3D <value optimized out>
>>     (gdb) p size
>>     $3 =3D <value optimized out>
>>     (gdb) p *allctr
>>     $4 =3D {name_prefix =3D 0x534227 "sl_", alloc_no =3D 3, name =3D {all=
>> oc =3D 0,
>>     realloc =3D 0, free =3D 0},
>>       vsn_str =3D 0x53602f "2.1", t =3D 0, ramv =3D 0, sbc_threshold =3D =
>> 524288,
>>     sbc_move_threshold =3D 80,
>>       mbc_move_threshold =3D 50, main_carrier_size =3D 131072, max_mseg_s=
>> bcs
>>     =3D 256, max_mseg_mbcs =3D 5,
>>       largest_mbc_size =3D 10485760, smallest_mbc_size =3D 1048576,
>>     mbc_growth_stages =3D 10, mseg_opt =3D {cache =3D 1,
>>         preserv =3D 1, abs_shrink_th =3D 4145152, rel_shrink_th =3D 80},
>>     mbc_header_size =3D 32, sbc_header_size =3D 32,
>>       min_mbc_size =3D 16384, min_mbc_first_free_size =3D 4096,
>>     min_block_size =3D 32, mbc_list =3D {first =3D 0x7f4f93a5d010,
>>         last =3D 0x7f4f93a5d010}, sbc_list =3D {first =3D 0x0, last =3D 0=
>> x0},
>>     main_carrier =3D 0x7f4f93a5d010,
>>       get_free_block =3D 0x437f40 <get_free_block>, link_free_block =3D
>>     0x437d00 <link_free_block>,
>>       unlink_free_block =3D 0x437e10 <unlink_free_block>, info_options =3D=
>>
>>     0x438480 <info_options>,
>>       get_next_mbc_size =3D 0x430e40 <get_next_mbc_size>, creating_mbc =3D=
>>
>>     0x438100 <update_last_aux_mbc>,
>>       destroying_mbc =3D 0x438100 <update_last_aux_mbc>, init_atoms =3D
>>     0x4385c0 <init_atoms>, mutex =3D {mtx =3D {pt_mtx =3D {
>>             __data =3D {__lock =3D 0, __count =3D 0, __owner =3D 0, __nus=
>> ers =3D
>>     0, __kind =3D 0, __spins =3D 0, __list =3D {
>>                 __prev =3D 0x0, __next =3D 0x0}}, __size =3D '\000' <repe=
>> ats
>>     39 times>, __align =3D 0}, is_rec_mtx =3D 0,
>>           prev =3D 0x0, next =3D 0x0}}, thread_safe =3D 0, ts_list =3D {p=
>> rev =3D
>>     0x0, next =3D 0x0}, atoms_initialized =3D 0,
>>       stopped =3D 0, calls =3D {this_alloc =3D {giga_no =3D 0, no =3D 246=
>> 0},
>>     this_free =3D {giga_no =3D 0, no =3D 2458},
>>         this_realloc =3D {giga_no =3D 0, no =3D 0}, mseg_alloc =3D {giga_=
>> no =3D 0,
>>     no =3D 0}, mseg_dealloc =3D {giga_no =3D 0, no =3D 0},
>>         mseg_realloc =3D {giga_no =3D 0, no =3D 0}, sys_alloc =3D {giga_n=
>> o =3D 0,
>>     no =3D 1}, sys_free =3D {giga_no =3D 0, no =3D 0},
>>         sys_realloc =3D {giga_no =3D 0, no =3D 0}}, sbcs =3D {curr_mseg =3D=
>>  {no =3D
>>     0, size =3D 0}, curr_sys_alloc =3D {no =3D 0,
>>           size =3D 0}, max =3D {no =3D 0, size =3D 0}, max_ever =3D {no =3D=
>>  0, size
>>     =3D 0}, blocks =3D {curr =3D {no =3D 0, size =3D 0},
>>           max =3D {no =3D 0, size =3D 0}, max_ever =3D {no =3D 0, size =3D=
>>  0}}},
>>     mbcs =3D {curr_mseg =3D {no =3D 0, size =3D 0},
>>         curr_sys_alloc =3D {no =3D 1, size =3D 131112}, max =3D {no =3D 1=
>> , size =3D
>>     131112}, max_ever =3D {no =3D 0, size =3D 0},
>>         blocks =3D {curr =3D {no =3D 4, size =3D 384}, max =3D {no =3D 14=
>> 4, size =3D
>>     13848}, max_ever =3D {no =3D 0, size =3D 0}}}}
>>     (gdb)
>>
>> And stalled here, do you have any advices? and also, any other
>> suggestions would be appreciated. TIA.
>>     
> This shows that in erl_goodfit_alloc.c, the ASSERT(blk) at the
> end of get_free_block() is bogus and that unlink_free_blk() can
> be invoked with a NULL blk, which will cause a crash.
>
> You should send this to the erlang-bugs mailing list.  It needs
> either the attention of someone who is intimately familiar with
> the logic of these allocators (I'm not), or for you to make a
> self-contained test case available (which you might not be able
> to do if it's proprietary).
>   
OK, I'll try to send to the erlang-bugs mailling list.

Thanks a lot for your help, and I'll mail you if any progress is made. :)

Eric


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20100613/18da90e9/attachment.bin>


More information about the erlang-questions mailing list