[erlang-bugs] SIGSEGV in SMP R11B-5 in bf_unlink_free_block() at beam/erl_bestfit_alloc.c:755

Scott Lystig Fritchie fritchie@REDACTED
Tue Sep 25 03:59:41 CEST 2007


Greetings.  I looks like we've hit another memory-related ETS bug,
on the same machine and using the same workload as I'd reported on 
23 September 2007:

    http://www.erlang.org/pipermail/erlang-bugs/2007-September/000444.html

The only extra platform info that I should've added to that report is
the data from /proc/cpuinfo.  I'll include that at the end of this
message.  (It's a 2x Opteron 2218, 4 cores total.)

The "left" pointer in the RBTree_t pointer looks suspiciously
un-pointer-like.

I'm going to guess that our next step will involve disabling the SMP
scheduler and see if we can trigger one of these bugs (or another)
during a 24 hour stress test.  If the bug appears without SMP support,
then that tells us something quite interesting.

For what it's worth ... this bug, together with my 23 September
report, is less than 2 weeks away from causing us some big headaches
with customer acceptance testing.

Thanks again for looking into this matter.  If there's other data I
can provide, please let me know.

-Scott

# gdb /.../R11B-5/lib/erlang/erts-5.5.5/bin/beam.smp /var/cores/beam.smp.core.6668
[...]
Core was generated by `/.../R11B-5/lib/erlang/erts-5.5.5/bin/beam.smp -A 64 -K true -'.
Program terminated with signal 11, Segmentation fault.
[...]
#0  0x08078aec in bf_unlink_free_block (allctr=0x818a580, block=0x9799c4e4)
    at beam/erl_bestfit_alloc.c:755
755     beam/erl_bestfit_alloc.c: No such file or directory.
        in beam/erl_bestfit_alloc.c
(gdb) where
#0  0x08078aec in bf_unlink_free_block (allctr=0x818a580, block=0x9799c4e4)
    at beam/erl_bestfit_alloc.c:755
#1  0x080734cc in mbc_free (allctr=0x818a580, p=Variable "p" is not available.
) at beam/erl_alloc_util.c:747
#2  0x08076680 in erts_alcu_free_ts (type=106, extra=0x818a580, p=0x9799c260)
    at beam/erl_alloc_util.c:2221
#3  0x080d781b in db_free_table_continue_hash (tbl=0xaa42958c, first=0)
    at beam/erl_alloc.h:200
#4  0x080c6202 in free_table_cont (p=0xb7e2c344, tb=0xaa42958c, first=0)
    at beam/erl_db.c:2486
#5  0x080c649b in ets_db_delete_1 (A__p=0x80c6424, A_1=2856490380)
    at beam/erl_db.c:1190
#6  0x08102fc6 in process_main () at beam/beam_emu.c:3409
#7  0x080b00cf in sched_thread_func (vesdp=0xb6e2c968)
    at beam/erl_process.c:947
#8  0x08147f1a in thr_wrapper (vtwd=0xbfffef00) at common/ethread.c:503
#9  0x00444371 in start_thread () from /lib/tls/libpthread.so.0
#10 0x001f0ffe in clone () from /lib/tls/libc.so.6

(gdb) p *allctr
$1 = {name_prefix = 0x814c232 "ets_", alloc_no = 8, name = {alloc = 0, 
    realloc = 0, free = 0}, vsn_str = 0x814d0f9 "0.9", sbc_threshold = 524288, 
  sbc_move_threshold = 80, main_carrier_size = 131072, max_mseg_sbcs = 256, 
  max_mseg_mbcs = 10, largest_mbc_size = 5242880, smallest_mbc_size = 1048576, 
  mbc_growth_stages = 10, mseg_opt = {cache = 1, preserv = 1, 
    abs_shrink_th = 4145152, rel_shrink_th = 20}, mbc_header_size = 20, 
  min_mbc_size = 16384, min_mbc_first_free_size = 4096, min_block_size = 32, 
  mbc_list = {first = 0xb7ca6008, last = 0x8a28a008}, sbc_list = {first = 0x0, 
    last = 0x0}, main_carrier = 0xb7ca6008, 
  get_free_block = 0x8078b60 <bf_get_free_block>, 
  link_free_block = 0x8078a20 <bf_link_free_block>, 
  unlink_free_block = 0x8078ad0 <bf_unlink_free_block>, 
  info_options = 0x8078cc4 <info_options>, 
  get_next_mbc_size = 0x80731c0 <get_next_mbc_size>, creating_mbc = 0, 
  destroying_mbc = 0, init_atoms = 0x8078c2c <init_atoms>, mutex = {mtx = {
      pt_mtx = {__data = {__lock = 1, __count = 0, __owner = 6745, __kind = 0, 
          __nusers = 1, __spins = 0}, 
        __size = "\001\000\000\000\000\000\000\000Y\032\000\000\000\000\000\000\
001\000\000\000\000\000\000", __align = 1}, is_rec_mtx = 0, prev = 0x81c17a0, 
      next = 0x8189a8c}}, thread_safe = 1, ts_list = {prev = 0x0, next = 0x0}, 
  atoms_initialized = 0, stopped = 0, calls = {this_alloc = {giga_no = 1, 
      no = 641792639}, this_free = {giga_no = 1, no = 640206088}, 
    this_realloc = {giga_no = 0, no = 15206700}, mseg_alloc = {giga_no = 0, 
      no = 21}, mseg_dealloc = {giga_no = 0, no = 11}, mseg_realloc = {
      giga_no = 0, no = 0}, sys_alloc = {giga_no = 0, no = 345}, sys_free = {
      giga_no = 0, no = 189}, sys_realloc = {giga_no = 0, no = 0}}, sbcs = {
    curr_mseg = {no = 0, size = 0}, curr_sys_alloc = {no = 0, size = 0}, 
    max = {no = 0, size = 0}, max_ever = {no = 0, size = 0}, blocks = {curr = {
        no = 0, size = 0}, max = {no = 0, size = 0}, max_ever = {no = 0, 
        size = 0}}}, mbcs = {curr_mseg = {no = 10, size = 29376512}, 
    curr_sys_alloc = {no = 156, size = 162660384}, max = {no = 166, 
      size = 192036896}, max_ever = {no = 0, size = 0}, blocks = {curr = {
        no = 1586551, size = 144248760}, max = {no = 1892770, 
        size = 167527120}, max_ever = {no = 0, size = 0}}}}
(gdb) p *block
$2 = 332427
(gdb) p x
No symbol "x" in current context.
(gdb) p *(RBTree_t *)block
$3 = {hdr = 332427, flags = 2543437042, parent = 0xb2cb3, left = 0x88, 
  right = 0x680489e9}
(gdb) up
#1  0x080734cc in mbc_free (allctr=0x818a580, p=Variable "p" is not available.
) at beam/erl_alloc_util.c:747
747     beam/erl_alloc_util.c: No such file or directory.
        in beam/erl_alloc_util.c
(gdb) p allctr->unlink_free_block
$4 = (void (*)(Allctr_t *, Block_t *)) 0x8078ad0 <bf_unlink_free_block>
(gdb) p nxt_blk
$5 = (Block_t *) 0x9799c4e4
(gdb) p *nxt_blk
$6 = 332427
(gdb) p is_first_blk
$7 = 135833100
(gdb) p is_last_blk
$8 = 0
(gdb) up
#2  0x08076680 in erts_alcu_free_ts (type=106, extra=0x818a580, p=0x9799c260)
    at beam/erl_alloc_util.c:2221
2221    in beam/erl_alloc_util.c
(gdb) up
#3  0x080d781b in db_free_table_continue_hash (tbl=0xaa42958c, first=0)
    at beam/erl_alloc.h:200
200     beam/erl_alloc.h: No such file or directory.
        in beam/erl_alloc.h
(gdb) up
#4  0x080c6202 in free_table_cont (p=0xb7e2c344, tb=0xaa42958c, first=0)
    at beam/erl_db.c:2486
2486    beam/erl_db.c: No such file or directory.
        in beam/erl_db.c
(gdb) p *p
$9 = {htop = 0xa4a31af0, stop = 0xa4a31ea0, heap = 0xa4a31900, 
  hend = 0xa4a31ee4, heap_sz = 377, min_heap_size = 233, fp_exception = 0, 
  hipe = {nsp = 0x0, nstack = 0x0, nstend = 0x0, ncallee = 0, closure = 0, 
    nstgraylim = 0x0, nstblacklim = 0x0, ngra = 0, ncsp = 0x0, narity = 0}, 
  arity = 0, arg_reg = 0xb7e2c394, max_arg_reg = 6, def_arg_reg = {2856490380, 
    117899, 3084146865, 3084525588, 0, 2000}, cp = 0xb7c63d1c, i = 0xb7da203c, 
  catches = 2, fcalls = 1999, status = 3, rstatus = 0, rcount = 0, id = 1059, 
  prio = 2, skipped = 0, reds = 24967791, error_handler = 7819, 
  tracer_proc = 4294967291, trace_flags = 0, group_leader = 931, flags = 33, 
  fvalue = 4294967291, freason = 256, ftrace = 4294967291, dist_entry = 0x0, 
  next = 0x0, reg = 0xb7c72f7c, nlinks = 0xb7ee994c, monitors = 0x0, 
  nodes_monitors = 0x0, msg = {first = 0x819b8f0, last = 0x819b8f0, 
    save = 0xb7e2c410, len = 1}, bif_timers = 0x0, dictionary = 0xb7f6cba8, 
  debug_dictionary = 0x0, ct = 0x0, seq_trace_clock = 0, 
  seq_trace_lastcnt = 0, seq_trace_token = 4294967291, initial = {99595, 
    117899, 5}, current = 0xb7da2030, parent = 995, started = 1190516698, 
  high_water = 0xa4a31940, old_hend = 0xb7d281bc, old_htop = 0xb7d27d24, 
  old_heap = 0xb7d27bd8, gen_gcs = 715, max_gen_gcs = 65535, off_heap = {
    mso = 0x0, funs = 0x0, externals = 0x0, overhead = 0}, mbuf = 0x0, 
  mbuf_sz = 0, arith_heap = 0x0, arith_avail = 0, ptimer = 0x0, 
  scheduler_data = 0xb6e2c968, is_exiting = 0, scheduler_flags = 1, 
  status_flags = 12, lock_flags = 1, msg_inq = {first = 0x0, 
    last = 0xb7e2c4a0, len = 0}, suspendee = 4294967291, 
  pending_suspenders = 0x0, pending_exit = {reason = 0, bp = 0x0}, hipe_smp = {
    have_receive_locks = 0}}
(gdb) p *tb
$10 = {common = {ref = {counter = 2}, rwlock = {rwmtx = {pt_rwlock = {
          __data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, 
            __writer_wakeup = 983, __nr_readers_queued = 0, 
            __nr_writers_queued = 0, __flags = 0, __writer = 6745}, 
          __size = '\0' <repeats 12 times>, "%G�%@003", '\0' <repeats 14 times>, "Y\
032\000", __align = 0}}}, type = 32, owner = 1059, the_name = 382219, 
    id = 80127, meth = 0x81765c0, nitems = 385564, memory_size = {
      counter = 29558512}, megasec = 0, sec = 0, microsec = 0, 
    fixations = 0x0, status = 97, slot = 5007, keypos = 2, kept_items = 0}, 
  hash = {common = {ref = {counter = 2}, rwlock = {rwmtx = {pt_rwlock = {
            __data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, 
              __writer_wakeup = 983, __nr_readers_queued = 0, 
              __nr_writers_queued = 0, __flags = 0, __writer = 6745}, 
            __size = '\0' <repeats 12 times>, "%G�%@003", '\0' <repeats 14 times>, "
Y\032\000", __align = 0}}}, type = 32, owner = 1059, the_name = 382219, 
      id = 80127, meth = 0x81765c0, nitems = 385564, memory_size = {
        counter = 29558512}, megasec = 0, sec = 0, microsec = 0, 
      fixations = 0x0, status = 97, slot = 5007, keypos = 2, kept_items = 0}, 
    fixdel = 0x0, seg = 0x8de67038, szm = 65535, nactive = 85594, 
    nslots = 85760, p = 126, nsegs = 384}, tree = {common = {ref = {
        counter = 2}, rwlock = {rwmtx = {pt_rwlock = {__data = {__lock = 0, 
              __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 983, 
              __nr_readers_queued = 0, __nr_writers_queued = 0, __flags = 0, 
              __writer = 6745}, 
            __size = '\0' <repeats 12 times>, "%G�%@003", '\0' <repeats 14 times>, "
Y\032\000", __align = 0}}}, type = 32, owner = 1059, the_name = 382219, 
      id = 80127, meth = 0x81765c0, nitems = 385564, memory_size = {
        counter = 29558512}, megasec = 0, sec = 0, microsec = 0, 
      fixations = 0x0, status = 97, slot = 5007, keypos = 2, kept_items = 0}, 
    root = 0x0, stack = 0x8de67038, stack_pos = 65535, slot_pos = 85594, 
    deletion = 85760}}
(gdb) p first
$11 = 0
(gdb) up
#5  0x080c649b in ets_db_delete_1 (A__p=0x80c6424, A_1=2856490380)
    at beam/erl_db.c:1190
1190    in beam/erl_db.c
(gdb) q

------------------------------------------------------------------

[root@REDACTED ~]$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 2218
stepping        : 2
cpu MHz         : 2601.145
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni
bogomips        : 5203.55

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 2218
stepping        : 2
cpu MHz         : 2601.145
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni
bogomips        : 5200.11

processor       : 2
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 2218
stepping        : 2
cpu MHz         : 2601.145
cache size      : 1024 KB
physical id     : 1
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni
bogomips        : 5200.46

processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 2218
stepping        : 2
cpu MHz         : 2601.145
cache size      : 1024 KB
physical id     : 1
siblings        : 2
core id         : 1
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni
bogomips        : 5200.21







More information about the erlang-bugs mailing list