[erlang-bugs] SIGSEGV in SMP R11B-5 in bf_unlink_free_block() at beam/erl_bestfit_alloc.c:755

Rickard Green rickard.s.green@REDACTED
Tue Sep 25 10:02:53 CEST 2007


We've had a brief look at it without finding anything yet, but we'll dig 
deeper. As you suggested, it would be interesting to see if you are able 
to trigger it without smp support.

BR,
Rickard Green, Erlang/OTP, Ericsson AB.

Scott Lystig Fritchie wrote:
> Greetings.  I looks like we've hit another memory-related ETS bug,
> on the same machine and using the same workload as I'd reported on 
> 23 September 2007:
> 
>     http://www.erlang.org/pipermail/erlang-bugs/2007-September/000444.html
> 
> The only extra platform info that I should've added to that report is
> the data from /proc/cpuinfo.  I'll include that at the end of this
> message.  (It's a 2x Opteron 2218, 4 cores total.)
> 
> The "left" pointer in the RBTree_t pointer looks suspiciously
> un-pointer-like.
> 
> I'm going to guess that our next step will involve disabling the SMP
> scheduler and see if we can trigger one of these bugs (or another)
> during a 24 hour stress test.  If the bug appears without SMP support,
> then that tells us something quite interesting.
> 
> For what it's worth ... this bug, together with my 23 September
> report, is less than 2 weeks away from causing us some big headaches
> with customer acceptance testing.
> 
> Thanks again for looking into this matter.  If there's other data I
> can provide, please let me know.
> 
> -Scott
> 
> # gdb /.../R11B-5/lib/erlang/erts-5.5.5/bin/beam.smp /var/cores/beam.smp.core.6668
> [...]
> Core was generated by `/.../R11B-5/lib/erlang/erts-5.5.5/bin/beam.smp -A 64 -K true -'.
> Program terminated with signal 11, Segmentation fault.
> [...]
> #0  0x08078aec in bf_unlink_free_block (allctr=0x818a580, block=0x9799c4e4)
>     at beam/erl_bestfit_alloc.c:755
> 755     beam/erl_bestfit_alloc.c: No such file or directory.
>         in beam/erl_bestfit_alloc.c
> (gdb) where
> #0  0x08078aec in bf_unlink_free_block (allctr=0x818a580, block=0x9799c4e4)
>     at beam/erl_bestfit_alloc.c:755
> #1  0x080734cc in mbc_free (allctr=0x818a580, p=Variable "p" is not available.
> ) at beam/erl_alloc_util.c:747
> #2  0x08076680 in erts_alcu_free_ts (type=106, extra=0x818a580, p=0x9799c260)
>     at beam/erl_alloc_util.c:2221
> #3  0x080d781b in db_free_table_continue_hash (tbl=0xaa42958c, first=0)
>     at beam/erl_alloc.h:200
> #4  0x080c6202 in free_table_cont (p=0xb7e2c344, tb=0xaa42958c, first=0)
>     at beam/erl_db.c:2486
> #5  0x080c649b in ets_db_delete_1 (A__p=0x80c6424, A_1=2856490380)
>     at beam/erl_db.c:1190
> #6  0x08102fc6 in process_main () at beam/beam_emu.c:3409
> #7  0x080b00cf in sched_thread_func (vesdp=0xb6e2c968)
>     at beam/erl_process.c:947
> #8  0x08147f1a in thr_wrapper (vtwd=0xbfffef00) at common/ethread.c:503
> #9  0x00444371 in start_thread () from /lib/tls/libpthread.so.0
> #10 0x001f0ffe in clone () from /lib/tls/libc.so.6
> 
> (gdb) p *allctr
> $1 = {name_prefix = 0x814c232 "ets_", alloc_no = 8, name = {alloc = 0, 
>     realloc = 0, free = 0}, vsn_str = 0x814d0f9 "0.9", sbc_threshold = 524288, 
>   sbc_move_threshold = 80, main_carrier_size = 131072, max_mseg_sbcs = 256, 
>   max_mseg_mbcs = 10, largest_mbc_size = 5242880, smallest_mbc_size = 1048576, 
>   mbc_growth_stages = 10, mseg_opt = {cache = 1, preserv = 1, 
>     abs_shrink_th = 4145152, rel_shrink_th = 20}, mbc_header_size = 20, 
>   min_mbc_size = 16384, min_mbc_first_free_size = 4096, min_block_size = 32, 
>   mbc_list = {first = 0xb7ca6008, last = 0x8a28a008}, sbc_list = {first = 0x0, 
>     last = 0x0}, main_carrier = 0xb7ca6008, 
>   get_free_block = 0x8078b60 <bf_get_free_block>, 
>   link_free_block = 0x8078a20 <bf_link_free_block>, 
>   unlink_free_block = 0x8078ad0 <bf_unlink_free_block>, 
>   info_options = 0x8078cc4 <info_options>, 
>   get_next_mbc_size = 0x80731c0 <get_next_mbc_size>, creating_mbc = 0, 
>   destroying_mbc = 0, init_atoms = 0x8078c2c <init_atoms>, mutex = {mtx = {
>       pt_mtx = {__data = {__lock = 1, __count = 0, __owner = 6745, __kind = 0, 
>           __nusers = 1, __spins = 0}, 
>         __size = "\001\000\000\000\000\000\000\000Y\032\000\000\000\000\000\000\
> 001\000\000\000\000\000\000", __align = 1}, is_rec_mtx = 0, prev = 0x81c17a0, 
>       next = 0x8189a8c}}, thread_safe = 1, ts_list = {prev = 0x0, next = 0x0}, 
>   atoms_initialized = 0, stopped = 0, calls = {this_alloc = {giga_no = 1, 
>       no = 641792639}, this_free = {giga_no = 1, no = 640206088}, 
>     this_realloc = {giga_no = 0, no = 15206700}, mseg_alloc = {giga_no = 0, 
>       no = 21}, mseg_dealloc = {giga_no = 0, no = 11}, mseg_realloc = {
>       giga_no = 0, no = 0}, sys_alloc = {giga_no = 0, no = 345}, sys_free = {
>       giga_no = 0, no = 189}, sys_realloc = {giga_no = 0, no = 0}}, sbcs = {
>     curr_mseg = {no = 0, size = 0}, curr_sys_alloc = {no = 0, size = 0}, 
>     max = {no = 0, size = 0}, max_ever = {no = 0, size = 0}, blocks = {curr = {
>         no = 0, size = 0}, max = {no = 0, size = 0}, max_ever = {no = 0, 
>         size = 0}}}, mbcs = {curr_mseg = {no = 10, size = 29376512}, 
>     curr_sys_alloc = {no = 156, size = 162660384}, max = {no = 166, 
>       size = 192036896}, max_ever = {no = 0, size = 0}, blocks = {curr = {
>         no = 1586551, size = 144248760}, max = {no = 1892770, 
>         size = 167527120}, max_ever = {no = 0, size = 0}}}}
> (gdb) p *block
> $2 = 332427
> (gdb) p x
> No symbol "x" in current context.
> (gdb) p *(RBTree_t *)block
> $3 = {hdr = 332427, flags = 2543437042, parent = 0xb2cb3, left = 0x88, 
>   right = 0x680489e9}
> (gdb) up
> #1  0x080734cc in mbc_free (allctr=0x818a580, p=Variable "p" is not available.
> ) at beam/erl_alloc_util.c:747
> 747     beam/erl_alloc_util.c: No such file or directory.
>         in beam/erl_alloc_util.c
> (gdb) p allctr->unlink_free_block
> $4 = (void (*)(Allctr_t *, Block_t *)) 0x8078ad0 <bf_unlink_free_block>
> (gdb) p nxt_blk
> $5 = (Block_t *) 0x9799c4e4
> (gdb) p *nxt_blk
> $6 = 332427
> (gdb) p is_first_blk
> $7 = 135833100
> (gdb) p is_last_blk
> $8 = 0
> (gdb) up
> #2  0x08076680 in erts_alcu_free_ts (type=106, extra=0x818a580, p=0x9799c260)
>     at beam/erl_alloc_util.c:2221
> 2221    in beam/erl_alloc_util.c
> (gdb) up
> #3  0x080d781b in db_free_table_continue_hash (tbl=0xaa42958c, first=0)
>     at beam/erl_alloc.h:200
> 200     beam/erl_alloc.h: No such file or directory.
>         in beam/erl_alloc.h
> (gdb) up
> #4  0x080c6202 in free_table_cont (p=0xb7e2c344, tb=0xaa42958c, first=0)
>     at beam/erl_db.c:2486
> 2486    beam/erl_db.c: No such file or directory.
>         in beam/erl_db.c
> (gdb) p *p
> $9 = {htop = 0xa4a31af0, stop = 0xa4a31ea0, heap = 0xa4a31900, 
>   hend = 0xa4a31ee4, heap_sz = 377, min_heap_size = 233, fp_exception = 0, 
>   hipe = {nsp = 0x0, nstack = 0x0, nstend = 0x0, ncallee = 0, closure = 0, 
>     nstgraylim = 0x0, nstblacklim = 0x0, ngra = 0, ncsp = 0x0, narity = 0}, 
>   arity = 0, arg_reg = 0xb7e2c394, max_arg_reg = 6, def_arg_reg = {2856490380, 
>     117899, 3084146865, 3084525588, 0, 2000}, cp = 0xb7c63d1c, i = 0xb7da203c, 
>   catches = 2, fcalls = 1999, status = 3, rstatus = 0, rcount = 0, id = 1059, 
>   prio = 2, skipped = 0, reds = 24967791, error_handler = 7819, 
>   tracer_proc = 4294967291, trace_flags = 0, group_leader = 931, flags = 33, 
>   fvalue = 4294967291, freason = 256, ftrace = 4294967291, dist_entry = 0x0, 
>   next = 0x0, reg = 0xb7c72f7c, nlinks = 0xb7ee994c, monitors = 0x0, 
>   nodes_monitors = 0x0, msg = {first = 0x819b8f0, last = 0x819b8f0, 
>     save = 0xb7e2c410, len = 1}, bif_timers = 0x0, dictionary = 0xb7f6cba8, 
>   debug_dictionary = 0x0, ct = 0x0, seq_trace_clock = 0, 
>   seq_trace_lastcnt = 0, seq_trace_token = 4294967291, initial = {99595, 
>     117899, 5}, current = 0xb7da2030, parent = 995, started = 1190516698, 
>   high_water = 0xa4a31940, old_hend = 0xb7d281bc, old_htop = 0xb7d27d24, 
>   old_heap = 0xb7d27bd8, gen_gcs = 715, max_gen_gcs = 65535, off_heap = {
>     mso = 0x0, funs = 0x0, externals = 0x0, overhead = 0}, mbuf = 0x0, 
>   mbuf_sz = 0, arith_heap = 0x0, arith_avail = 0, ptimer = 0x0, 
>   scheduler_data = 0xb6e2c968, is_exiting = 0, scheduler_flags = 1, 
>   status_flags = 12, lock_flags = 1, msg_inq = {first = 0x0, 
>     last = 0xb7e2c4a0, len = 0}, suspendee = 4294967291, 
>   pending_suspenders = 0x0, pending_exit = {reason = 0, bp = 0x0}, hipe_smp = {
>     have_receive_locks = 0}}
> (gdb) p *tb
> $10 = {common = {ref = {counter = 2}, rwlock = {rwmtx = {pt_rwlock = {
>           __data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, 
>             __writer_wakeup = 983, __nr_readers_queued = 0, 
>             __nr_writers_queued = 0, __flags = 0, __writer = 6745}, 
>           __size = '\0' <repeats 12 times>, "%G�%@003", '\0' <repeats 14 times>, "Y\
> 032\000", __align = 0}}}, type = 32, owner = 1059, the_name = 382219, 
>     id = 80127, meth = 0x81765c0, nitems = 385564, memory_size = {
>       counter = 29558512}, megasec = 0, sec = 0, microsec = 0, 
>     fixations = 0x0, status = 97, slot = 5007, keypos = 2, kept_items = 0}, 
>   hash = {common = {ref = {counter = 2}, rwlock = {rwmtx = {pt_rwlock = {
>             __data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, 
>               __writer_wakeup = 983, __nr_readers_queued = 0, 
>               __nr_writers_queued = 0, __flags = 0, __writer = 6745}, 
>             __size = '\0' <repeats 12 times>, "%G�%@003", '\0' <repeats 14 times>, "
> Y\032\000", __align = 0}}}, type = 32, owner = 1059, the_name = 382219, 
>       id = 80127, meth = 0x81765c0, nitems = 385564, memory_size = {
>         counter = 29558512}, megasec = 0, sec = 0, microsec = 0, 
>       fixations = 0x0, status = 97, slot = 5007, keypos = 2, kept_items = 0}, 
>     fixdel = 0x0, seg = 0x8de67038, szm = 65535, nactive = 85594, 
>     nslots = 85760, p = 126, nsegs = 384}, tree = {common = {ref = {
>         counter = 2}, rwlock = {rwmtx = {pt_rwlock = {__data = {__lock = 0, 
>               __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 983, 
>               __nr_readers_queued = 0, __nr_writers_queued = 0, __flags = 0, 
>               __writer = 6745}, 
>             __size = '\0' <repeats 12 times>, "%G�%@003", '\0' <repeats 14 times>, "
> Y\032\000", __align = 0}}}, type = 32, owner = 1059, the_name = 382219, 
>       id = 80127, meth = 0x81765c0, nitems = 385564, memory_size = {
>         counter = 29558512}, megasec = 0, sec = 0, microsec = 0, 
>       fixations = 0x0, status = 97, slot = 5007, keypos = 2, kept_items = 0}, 
>     root = 0x0, stack = 0x8de67038, stack_pos = 65535, slot_pos = 85594, 
>     deletion = 85760}}
> (gdb) p first
> $11 = 0
> (gdb) up
> #5  0x080c649b in ets_db_delete_1 (A__p=0x80c6424, A_1=2856490380)
>     at beam/erl_db.c:1190
> 1190    in beam/erl_db.c
> (gdb) q
> 
> ------------------------------------------------------------------
> 
> [root@REDACTED ~]$ cat /proc/cpuinfo 
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 15
> model           : 65
> model name      : Dual-Core AMD Opteron(tm) Processor 2218
> stepping        : 2
> cpu MHz         : 2601.145
> cache size      : 1024 KB
> physical id     : 0
> siblings        : 2
> core id         : 0
> cpu cores       : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni
> bogomips        : 5203.55
> 
> processor       : 1
> vendor_id       : AuthenticAMD
> cpu family      : 15
> model           : 65
> model name      : Dual-Core AMD Opteron(tm) Processor 2218
> stepping        : 2
> cpu MHz         : 2601.145
> cache size      : 1024 KB
> physical id     : 0
> siblings        : 2
> core id         : 1
> cpu cores       : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni
> bogomips        : 5200.11
> 
> processor       : 2
> vendor_id       : AuthenticAMD
> cpu family      : 15
> model           : 65
> model name      : Dual-Core AMD Opteron(tm) Processor 2218
> stepping        : 2
> cpu MHz         : 2601.145
> cache size      : 1024 KB
> physical id     : 1
> siblings        : 2
> core id         : 0
> cpu cores       : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni
> bogomips        : 5200.46
> 
> processor       : 3
> vendor_id       : AuthenticAMD
> cpu family      : 15
> model           : 65
> model name      : Dual-Core AMD Opteron(tm) Processor 2218
> stepping        : 2
> cpu MHz         : 2601.145
> cache size      : 1024 KB
> physical id     : 1
> siblings        : 2
> core id         : 1
> cpu cores       : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni
> bogomips        : 5200.21
> 
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-bugs



More information about the erlang-bugs mailing list