[erlang-questions] beam[8449]: segfault at 0 ip 0000000000437e10 sp 00007fffce250948 error 4 in beam[400000+174000]
Eric Liang
eric.l.2046@REDACTED
Sun Jun 13 05:11:52 CEST 2010
On 06/12/2010 05:45 PM, Mikael Pettersson wrote:
> Eric Liang wrote:
>
>> On 05/27/2010 02:14 AM, Mikael Pettersson wrote:
>>
>>> Eric Liang wrote:
>>> =20
>>>
>>>> I've done a build of the source, but it just can't match the object. H=
>>>>
>> ow
>>
>>>> do you make it? I use the command: apt-get source to get the source, s=
>>>>
>> o
>>
>>>> it does have the same version with the object.
>>>> =20
>>>>
>>> I did:
>>>
>>> =20
>>>
>>>> tar zxvf otp_src_R13B03.tar.gz
>>>> cd otp_src_R13B03
>>>> ./configure; make
>>>> =20
>>>>
>>> The binary files of interest are bin/x86_64-unknown-linux-gnu/beam and =
>>>
>>
>>> erts/emulator/obj/x86_64-unknown-linux-gnu/opt/plain/erl_goodfit_alloc.=
>>>
>> o.
>>
>>> =20
>>>
>> Thanks Mikael, and sorry for replying you too late as the seg-fault is
>> not occured every time.
>>
>> I get the debug symbols by this:
>>
>> http://forum.nginx.org/read.php?26,93440,94735
>>
>>
>>>>> You can get a stack dump from the crash by attaching gdb to the
>>>>> soon-to-crash beam process. Now instead of being terminated gdb will
>>>>> get control of the process and you should be able to print a stack
>>>>> trace with bt or where. (This does require that there's a sufficient
>>>>> time window from the start of the application to the crash.)
>>>>> =3D20
>>>>> =20
>>>>>
>>>> I've make a core dump 4 seconds before it crash, as mentioned above,=3D=
>>>>
>> 20
>>
>>>> because don't get the right symbols, it just with some quesion-marks:
>>>>
>>>> Core was generated by `/usr/lib/erlang/erts-5.7.2/bin/beam'.
>>>> #0 0x00007f0a28ecd5a9 in ?? ()
>>>> (gdb) whe
>>>> #0 0x00007f0a28ecd5a9 in ?? ()
>>>> #1 0x0000000000000000 in ?? ()
>>>> (gdb)
>>>> =20
>>>>
>>> A core dump from a time point before the crash is useless. Either get a=
>>>
>>
>>> core dump from the crash itself (execute `ulimit -c unlimited' in bash
>>> before running the test), or attach gdb, continue the process, and wait=
>>>
>>
>>> for gdb to receive control when the crash occurs.
>>> =20
>>>
>> I do set the ulimit -c in /etc/profile and after I reboot it:
>>
>> sunny@REDACTED:~$ ulimit -c
>> unlimited
>> sunny@REDACTED:~/commands$ cat /proc/sys/kernel/core_pattern
>> /tmp/core.%t.%e.%p
>>
>> And I the test is ok:
>>
>> sunny@REDACTED:~$ kill -s SIGSEGV $$
>> Connection to dev-2 closed.
>> sunny@REDACTED:~$ ls /tmp/
>> core.1275730620.bash.12566
>>
>> But still no core file generated,when the error occurs.
>>
>> Anyway, I attatched the running process by gdb, and here is the result:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> unlink_free_block (allctr=3D0x7ad480, block=3D0x0) at
>> beam/erl_goodfit_alloc.c:453
>> 453 Uint sz =3D BLK_SZ(blk);
>> (gdb) whe
>> #0 unlink_free_block (allctr=3D0x7ad480, block=3D0x0) at
>> beam/erl_goodfit_alloc.c:453
>> #1 0x0000000000437fd6 in get_free_block (allctr=3D0x7ad480,
>> size=3D<value optimized out>, cand_blk=3D0x0, cand_size=3D0)
>> at beam/erl_goodfit_alloc.c:421
>> #2 0x00000000004322c6 in mbc_alloc_block (allctr=3D0x7ad480, size=3D=
>> 72)
>> at beam/erl_alloc_util.c:631
>> #3 mbc_alloc (allctr=3D0x7ad480, size=3D72) at beam/erl_alloc_util.c=
>> :758
>> #4 0x00000000004b1697 in erts_alloc () at beam/erl_alloc.h:179
>> #5 exit_async () at beam/erl_async.c:132
>> #6 0x000000000043c13d in system_cleanup (exit_code=3D<value optimize=
>> d
>> out>) at beam/erl_init.c:1306
>> #7 0x000000000043c443 in erl_exit (n=3D0, fmt=3D0x54649c "") at
>> beam/erl_init.c:1380
>> #8 0x000000000045d042 in halt_0 (A__p=3D<value optimized out>) at
>> beam/bif.c:3319
>> #9 0x00000000004d081f in process_main () at beam/beam_emu.c:2008
>> #10 0x000000000043d56c in erl_start (argc=3D34, argv=3D<value optimiz=
>> ed
>> out>) at beam/erl_init.c:1233
>> #11 0x00000000004269b9 in main (argc=3D8049792, argv=3D0x0) at
>> sys/unix/erl_main.c:29
>> (gdb) f 1
>> #1 0x0000000000437fd6 in get_free_block (allctr=3D0x7ad480,
>> size=3D<value optimized out>, cand_blk=3D0x0, cand_size=3D0)
>> at beam/erl_goodfit_alloc.c:421
>> 421 unlink_free_block(allctr, blk);
>> (gdb) l 421
>> 416 /* We are guaranteed to find a block that fits in this
>> bucket */
>> 417 blk =3D search_bucket(allctr, min_bi, size);
>> 418 ASSERT(blk);
>> 419 if (cand_blk && cand_size <=3D BLK_SZ(blk))
>> 420 return NULL; /* cand_blk was better */
>> 421 unlink_free_block(allctr, blk);
>> 422 return blk;
>> 423 }
>> 424 =20
>> 425 =20
>> (gdb)
>>
>> As the running process use the no-debug symbol version beam, I guess the
>> ASSERT in line:418 does not work. So I dig in
>>
>> (gdb) p allctr
>> $1 =3D (Allctr_t *) 0x7ad480
>> (gdb) p min_bi
>> $2 =3D <value optimized out>
>> (gdb) p size
>> $3 =3D <value optimized out>
>> (gdb) p *allctr
>> $4 =3D {name_prefix =3D 0x534227 "sl_", alloc_no =3D 3, name =3D {all=
>> oc =3D 0,
>> realloc =3D 0, free =3D 0},
>> vsn_str =3D 0x53602f "2.1", t =3D 0, ramv =3D 0, sbc_threshold =3D =
>> 524288,
>> sbc_move_threshold =3D 80,
>> mbc_move_threshold =3D 50, main_carrier_size =3D 131072, max_mseg_s=
>> bcs
>> =3D 256, max_mseg_mbcs =3D 5,
>> largest_mbc_size =3D 10485760, smallest_mbc_size =3D 1048576,
>> mbc_growth_stages =3D 10, mseg_opt =3D {cache =3D 1,
>> preserv =3D 1, abs_shrink_th =3D 4145152, rel_shrink_th =3D 80},
>> mbc_header_size =3D 32, sbc_header_size =3D 32,
>> min_mbc_size =3D 16384, min_mbc_first_free_size =3D 4096,
>> min_block_size =3D 32, mbc_list =3D {first =3D 0x7f4f93a5d010,
>> last =3D 0x7f4f93a5d010}, sbc_list =3D {first =3D 0x0, last =3D 0=
>> x0},
>> main_carrier =3D 0x7f4f93a5d010,
>> get_free_block =3D 0x437f40 <get_free_block>, link_free_block =3D
>> 0x437d00 <link_free_block>,
>> unlink_free_block =3D 0x437e10 <unlink_free_block>, info_options =3D=
>>
>> 0x438480 <info_options>,
>> get_next_mbc_size =3D 0x430e40 <get_next_mbc_size>, creating_mbc =3D=
>>
>> 0x438100 <update_last_aux_mbc>,
>> destroying_mbc =3D 0x438100 <update_last_aux_mbc>, init_atoms =3D
>> 0x4385c0 <init_atoms>, mutex =3D {mtx =3D {pt_mtx =3D {
>> __data =3D {__lock =3D 0, __count =3D 0, __owner =3D 0, __nus=
>> ers =3D
>> 0, __kind =3D 0, __spins =3D 0, __list =3D {
>> __prev =3D 0x0, __next =3D 0x0}}, __size =3D '\000' <repe=
>> ats
>> 39 times>, __align =3D 0}, is_rec_mtx =3D 0,
>> prev =3D 0x0, next =3D 0x0}}, thread_safe =3D 0, ts_list =3D {p=
>> rev =3D
>> 0x0, next =3D 0x0}, atoms_initialized =3D 0,
>> stopped =3D 0, calls =3D {this_alloc =3D {giga_no =3D 0, no =3D 246=
>> 0},
>> this_free =3D {giga_no =3D 0, no =3D 2458},
>> this_realloc =3D {giga_no =3D 0, no =3D 0}, mseg_alloc =3D {giga_=
>> no =3D 0,
>> no =3D 0}, mseg_dealloc =3D {giga_no =3D 0, no =3D 0},
>> mseg_realloc =3D {giga_no =3D 0, no =3D 0}, sys_alloc =3D {giga_n=
>> o =3D 0,
>> no =3D 1}, sys_free =3D {giga_no =3D 0, no =3D 0},
>> sys_realloc =3D {giga_no =3D 0, no =3D 0}}, sbcs =3D {curr_mseg =3D=
>> {no =3D
>> 0, size =3D 0}, curr_sys_alloc =3D {no =3D 0,
>> size =3D 0}, max =3D {no =3D 0, size =3D 0}, max_ever =3D {no =3D=
>> 0, size
>> =3D 0}, blocks =3D {curr =3D {no =3D 0, size =3D 0},
>> max =3D {no =3D 0, size =3D 0}, max_ever =3D {no =3D 0, size =3D=
>> 0}}},
>> mbcs =3D {curr_mseg =3D {no =3D 0, size =3D 0},
>> curr_sys_alloc =3D {no =3D 1, size =3D 131112}, max =3D {no =3D 1=
>> , size =3D
>> 131112}, max_ever =3D {no =3D 0, size =3D 0},
>> blocks =3D {curr =3D {no =3D 4, size =3D 384}, max =3D {no =3D 14=
>> 4, size =3D
>> 13848}, max_ever =3D {no =3D 0, size =3D 0}}}}
>> (gdb)
>>
>> And stalled here, do you have any advices? and also, any other
>> suggestions would be appreciated. TIA.
>>
> This shows that in erl_goodfit_alloc.c, the ASSERT(blk) at the
> end of get_free_block() is bogus and that unlink_free_blk() can
> be invoked with a NULL blk, which will cause a crash.
>
> You should send this to the erlang-bugs mailing list. It needs
> either the attention of someone who is intimately familiar with
> the logic of these allocators (I'm not), or for you to make a
> self-contained test case available (which you might not be able
> to do if it's proprietary).
>
OK, I'll try to send to the erlang-bugs mailling list.
Thanks a lot for your help, and I'll mail you if any progress is made. :)
Eric
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20100613/18da90e9/attachment.bin>
More information about the erlang-questions
mailing list