[erlang-questions] beam[8449]: segfault at 0 ip 0000000000437e10 sp 00007fffce250948 error 4 in beam[400000+174000]
Mikael Pettersson
mikpe@REDACTED
Sat Jun 12 11:45:26 CEST 2010
Eric Liang wrote:
> On 05/27/2010 02:14 AM, Mikael Pettersson wrote:
> > Eric Liang wrote:
> > =20
> >> I've done a build of the source, but it just can't match the object. H=
> ow
> >> do you make it? I use the command: apt-get source to get the source, s=
> o
> >> it does have the same version with the object.
> >> =20
> > I did:
> >
> > =20
> >> tar zxvf otp_src_R13B03.tar.gz
> >> cd otp_src_R13B03
> >> ./configure; make
> >> =20
> > The binary files of interest are bin/x86_64-unknown-linux-gnu/beam and =
>
> > erts/emulator/obj/x86_64-unknown-linux-gnu/opt/plain/erl_goodfit_alloc.=
> o.
> >
> > =20
> Thanks Mikael, and sorry for replying you too late as the seg-fault is
> not occured every time.
>
> I get the debug symbols by this:
>
> http://forum.nginx.org/read.php?26,93440,94735
>
> >>> You can get a stack dump from the crash by attaching gdb to the
> >>> soon-to-crash beam process. Now instead of being terminated gdb will
> >>> get control of the process and you should be able to print a stack
> >>> trace with bt or where. (This does require that there's a sufficient
> >>> time window from the start of the application to the crash.)
> >>> =3D20
> >>> =20
> >> I've make a core dump 4 seconds before it crash, as mentioned above,=3D=
> 20
> >> because don't get the right symbols, it just with some quesion-marks:
> >>
> >> Core was generated by `/usr/lib/erlang/erts-5.7.2/bin/beam'.
> >> #0 0x00007f0a28ecd5a9 in ?? ()
> >> (gdb) whe
> >> #0 0x00007f0a28ecd5a9 in ?? ()
> >> #1 0x0000000000000000 in ?? ()
> >> (gdb)
> >> =20
> > A core dump from a time point before the crash is useless. Either get a=
>
> > core dump from the crash itself (execute `ulimit -c unlimited' in bash
> > before running the test), or attach gdb, continue the process, and wait=
>
> > for gdb to receive control when the crash occurs.
> > =20
>
> I do set the ulimit -c in /etc/profile and after I reboot it:
>
> sunny@REDACTED:~$ ulimit -c
> unlimited
> sunny@REDACTED:~/commands$ cat /proc/sys/kernel/core_pattern
> /tmp/core.%t.%e.%p
>
> And I the test is ok:
>
> sunny@REDACTED:~$ kill -s SIGSEGV $$
> Connection to dev-2 closed.
> sunny@REDACTED:~$ ls /tmp/
> core.1275730620.bash.12566
>
> But still no core file generated,when the error occurs.
>
> Anyway, I attatched the running process by gdb, and here is the result:
>
> Program received signal SIGSEGV, Segmentation fault.
> unlink_free_block (allctr=3D0x7ad480, block=3D0x0) at
> beam/erl_goodfit_alloc.c:453
> 453 Uint sz =3D BLK_SZ(blk);
> (gdb) whe
> #0 unlink_free_block (allctr=3D0x7ad480, block=3D0x0) at
> beam/erl_goodfit_alloc.c:453
> #1 0x0000000000437fd6 in get_free_block (allctr=3D0x7ad480,
> size=3D<value optimized out>, cand_blk=3D0x0, cand_size=3D0)
> at beam/erl_goodfit_alloc.c:421
> #2 0x00000000004322c6 in mbc_alloc_block (allctr=3D0x7ad480, size=3D=
> 72)
> at beam/erl_alloc_util.c:631
> #3 mbc_alloc (allctr=3D0x7ad480, size=3D72) at beam/erl_alloc_util.c=
> :758
> #4 0x00000000004b1697 in erts_alloc () at beam/erl_alloc.h:179
> #5 exit_async () at beam/erl_async.c:132
> #6 0x000000000043c13d in system_cleanup (exit_code=3D<value optimize=
> d
> out>) at beam/erl_init.c:1306
> #7 0x000000000043c443 in erl_exit (n=3D0, fmt=3D0x54649c "") at
> beam/erl_init.c:1380
> #8 0x000000000045d042 in halt_0 (A__p=3D<value optimized out>) at
> beam/bif.c:3319
> #9 0x00000000004d081f in process_main () at beam/beam_emu.c:2008
> #10 0x000000000043d56c in erl_start (argc=3D34, argv=3D<value optimiz=
> ed
> out>) at beam/erl_init.c:1233
> #11 0x00000000004269b9 in main (argc=3D8049792, argv=3D0x0) at
> sys/unix/erl_main.c:29
> (gdb) f 1
> #1 0x0000000000437fd6 in get_free_block (allctr=3D0x7ad480,
> size=3D<value optimized out>, cand_blk=3D0x0, cand_size=3D0)
> at beam/erl_goodfit_alloc.c:421
> 421 unlink_free_block(allctr, blk);
> (gdb) l 421
> 416 /* We are guaranteed to find a block that fits in this
> bucket */
> 417 blk =3D search_bucket(allctr, min_bi, size);
> 418 ASSERT(blk);
> 419 if (cand_blk && cand_size <=3D BLK_SZ(blk))
> 420 return NULL; /* cand_blk was better */
> 421 unlink_free_block(allctr, blk);
> 422 return blk;
> 423 }
> 424 =20
> 425 =20
> (gdb)
>
> As the running process use the no-debug symbol version beam, I guess the
> ASSERT in line:418 does not work. So I dig in
>
> (gdb) p allctr
> $1 =3D (Allctr_t *) 0x7ad480
> (gdb) p min_bi
> $2 =3D <value optimized out>
> (gdb) p size
> $3 =3D <value optimized out>
> (gdb) p *allctr
> $4 =3D {name_prefix =3D 0x534227 "sl_", alloc_no =3D 3, name =3D {all=
> oc =3D 0,
> realloc =3D 0, free =3D 0},
> vsn_str =3D 0x53602f "2.1", t =3D 0, ramv =3D 0, sbc_threshold =3D =
> 524288,
> sbc_move_threshold =3D 80,
> mbc_move_threshold =3D 50, main_carrier_size =3D 131072, max_mseg_s=
> bcs
> =3D 256, max_mseg_mbcs =3D 5,
> largest_mbc_size =3D 10485760, smallest_mbc_size =3D 1048576,
> mbc_growth_stages =3D 10, mseg_opt =3D {cache =3D 1,
> preserv =3D 1, abs_shrink_th =3D 4145152, rel_shrink_th =3D 80},
> mbc_header_size =3D 32, sbc_header_size =3D 32,
> min_mbc_size =3D 16384, min_mbc_first_free_size =3D 4096,
> min_block_size =3D 32, mbc_list =3D {first =3D 0x7f4f93a5d010,
> last =3D 0x7f4f93a5d010}, sbc_list =3D {first =3D 0x0, last =3D 0=
> x0},
> main_carrier =3D 0x7f4f93a5d010,
> get_free_block =3D 0x437f40 <get_free_block>, link_free_block =3D
> 0x437d00 <link_free_block>,
> unlink_free_block =3D 0x437e10 <unlink_free_block>, info_options =3D=
>
> 0x438480 <info_options>,
> get_next_mbc_size =3D 0x430e40 <get_next_mbc_size>, creating_mbc =3D=
>
> 0x438100 <update_last_aux_mbc>,
> destroying_mbc =3D 0x438100 <update_last_aux_mbc>, init_atoms =3D
> 0x4385c0 <init_atoms>, mutex =3D {mtx =3D {pt_mtx =3D {
> __data =3D {__lock =3D 0, __count =3D 0, __owner =3D 0, __nus=
> ers =3D
> 0, __kind =3D 0, __spins =3D 0, __list =3D {
> __prev =3D 0x0, __next =3D 0x0}}, __size =3D '\000' <repe=
> ats
> 39 times>, __align =3D 0}, is_rec_mtx =3D 0,
> prev =3D 0x0, next =3D 0x0}}, thread_safe =3D 0, ts_list =3D {p=
> rev =3D
> 0x0, next =3D 0x0}, atoms_initialized =3D 0,
> stopped =3D 0, calls =3D {this_alloc =3D {giga_no =3D 0, no =3D 246=
> 0},
> this_free =3D {giga_no =3D 0, no =3D 2458},
> this_realloc =3D {giga_no =3D 0, no =3D 0}, mseg_alloc =3D {giga_=
> no =3D 0,
> no =3D 0}, mseg_dealloc =3D {giga_no =3D 0, no =3D 0},
> mseg_realloc =3D {giga_no =3D 0, no =3D 0}, sys_alloc =3D {giga_n=
> o =3D 0,
> no =3D 1}, sys_free =3D {giga_no =3D 0, no =3D 0},
> sys_realloc =3D {giga_no =3D 0, no =3D 0}}, sbcs =3D {curr_mseg =3D=
> {no =3D
> 0, size =3D 0}, curr_sys_alloc =3D {no =3D 0,
> size =3D 0}, max =3D {no =3D 0, size =3D 0}, max_ever =3D {no =3D=
> 0, size
> =3D 0}, blocks =3D {curr =3D {no =3D 0, size =3D 0},
> max =3D {no =3D 0, size =3D 0}, max_ever =3D {no =3D 0, size =3D=
> 0}}},
> mbcs =3D {curr_mseg =3D {no =3D 0, size =3D 0},
> curr_sys_alloc =3D {no =3D 1, size =3D 131112}, max =3D {no =3D 1=
> , size =3D
> 131112}, max_ever =3D {no =3D 0, size =3D 0},
> blocks =3D {curr =3D {no =3D 4, size =3D 384}, max =3D {no =3D 14=
> 4, size =3D
> 13848}, max_ever =3D {no =3D 0, size =3D 0}}}}
> (gdb)
>
> And stalled here, do you have any advices? and also, any other
> suggestions would be appreciated. TIA.
This shows that in erl_goodfit_alloc.c, the ASSERT(blk) at the
end of get_free_block() is bogus and that unlink_free_blk() can
be invoked with a NULL blk, which will cause a crash.
You should send this to the erlang-bugs mailing list. It needs
either the attention of someone who is intimately familiar with
the logic of these allocators (I'm not), or for you to make a
self-contained test case available (which you might not be able
to do if it's proprietary).
/Mikael
More information about the erlang-questions
mailing list