[erlang-bugs] Strange bus error 7 with HiPE R12B-5 + patch

Paul Fisher pfisher@REDACTED
Tue Feb 24 02:56:22 CET 2009


We have been having strange core dumps happen occasionally in our 
environment, most of which end up with a stack trace like the following:

(gdb) where
#0  0x0000000045050b39 in ?? ()
#1  0x00002aaaabe25922 in ?? ()
#2  0x000000000000078f in ?? ()
#3  0x0000000000000000 in ?? ()

This happens on thread 1... the one that ends up running 
erts_sys_main_thread().  Pretty weird.

Today, i got a core dump on the same thread, while it was running 
gensweep_nstack().  What follows is my brief trolling through the dump 
and just following things until it was clear how we ended up at the 
problem.

I would love if someone could help further the diagnosis beyond this 
point, because I'm not sure where to look beyond this.  Without HiPE 
compilation (which we do on most of our modules) these problems do not 
occur, so it does seem to point to a HiPE related issue.

The environment is a 4 core, Core 2 Q6600, 4 G ECC memory with the 
emulator running SMP and running 64-bit.

cluster-14:/var/alertlogic# uname -a
Linux cluster-14 2.6.24-etchnhalf.1-amd64 #1 SMP Fri Dec 26 03:26:12 UTC 
2008 x86_64 GNU/Linux

Anyway, here is the gdb session:

Core was generated by `/usr/lib/erlang/erts-5.6.5/bin/beam.smp -Ktrue -W 
w -A 32 -a 128 -d -- -root /u'.
Program terminated with signal 7, Bus error.
#0  gensweep_nstack (p=0x2aaaade37808, ptr_old_htop=0x44048b28,
     ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
70	    if (likely(sdesc->bucket.hvalue == ra))
(gdb) where
#0  gensweep_nstack (p=0x2aaaade37808, ptr_old_htop=0x44048b28,
     ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
#1  0x00000000004bfc35 in minor_collection (p=0x2aaaade37808, need=2,
     objv=0x0, nobj=0, recl=0x44048e68) at beam/erl_gc.c:893
#2  0x00000000004c0761 in erts_garbage_collect (p=0x2aaaade37808, need=2,
     objv=0x0, nobj=0) at beam/erl_gc.c:374
#3  0x000000000050ae1f in hipe_gc (p=0x1b9b860, need=46912528116752)
     at hipe/hipe_native_bif.c:69
#4  0x000000000050be74 in nbif_gc_1 ()
     at x86_64-unknown-linux-gnu/opt/smp/hipe_amd64_bifs.S:540
#5  0x00002aaaade37808 in ?? ()
#6  0x00002aaaade37a80 in ?? ()
#7  0x0000000000000007 in ?? ()
#8  0x00002aaaaaaed9c0 in ?? ()
#9  0x00002aaaabe8f0c8 in ?? ()
#10 0x00002aaaabe937d8 in ?? ()
#11 0x00002aaaabe937d8 in ?? ()
#12 0x0000000000509ce4 in hipe_mode_switch (p=0x2aaaade37808, 
cmd=2895309840,
     reg=0x2aaaaaaed9c0) at hipe/hipe_x86_glue.h:196
#13 0x00000000004dd97b in process_main () at beam/beam_emu.c:4681
#14 0x000000000048100f in sched_thread_func (vesdp=<value optimized out>)
     at beam/erl_process.c:752
#15 0x0000000000549f24 in thr_wrapper (vtwd=<value optimized out>)
---Type <return> to continue, or q <return> to quit---
     at common/ethread.c:474
#16 0x00002afcecf9ef1a in start_thread () from /lib/libpthread.so.0
#17 0x00002afced2815d2 in sysctl () from /lib/libc.so.6
#18 0x0000000000000000 in ?? ()
(gdb) p ra
$1 = 1159839139
(gdb) p *sdesc
Cannot access memory at address 0x7d337b25097d327b
(gdb) list
65	
66	static __inline__ const struct sdesc *hipe_find_sdesc(unsigned long ra)
67	{
68	    unsigned int i = (ra >> HIPE_RA_LSR_COUNT) & hipe_sdesc_table.mask;
69	    const struct sdesc *sdesc = hipe_sdesc_table.bucket[i];
70	    if (likely(sdesc->bucket.hvalue == ra))
71		return sdesc;
72	    do {
73		sdesc = sdesc->bucket.next;
74	    } while (sdesc->bucket.hvalue != ra);
(gdb) p i
$2 = 1
(gdb) p hipe_sdesc_table
$3 = {
   log2size = 16,
   mask = 65535,
   used = 26215,
   bucket = 0x2aaaadc35010
}
(gdb) p hipe_sdesc_table.bucket[1]
$4 = (struct sdesc *) 0x8f2c50
(gdb) p *hipe_sdesc_table.bucket[1]
$5 = {
   bucket = {
     hvalue = 1159004161,
     next = 0x85ab30
   },
   summary = 2048,
   livebits = {14}
}
(gdb) p *hipe_sdesc_table.bucket[1]->bucket.next
$6 = {
   bucket = {
     hvalue = 1158348801,
     next = 0x819410
   },
   summary = 1536,
   livebits = {1}
}
(gdb) p *hipe_sdesc_table.bucket[1]->bucket.next->bucket.next
$7 = {
   bucket = {
     hvalue = 1158283265,
     next = 0x0
   },
   summary = 2048,
   livebits = {0}
}
(gdb) p *hipe_sdesc_table.bucket[1]->bucket.next->bucket.next->bucket.next
Cannot access memory at address 0x0
(gdb) i threads
   40 process 24151  0x00002afced27aa96 in getdomainname () from 
/lib/libc.so.6
   39 process 24153  0x00002afcecfa41bf in __read_nocancel ()
    from /lib/libpthread.so.0
   38 process 24156  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   37 process 24158  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   36 process 24159  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   35 process 24160  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   34 process 24161  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   33 process 24162  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   32 process 24163  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   31 process 24164  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   30 process 24165  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   29 process 24166  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
   28 process 24167  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   27 process 24168  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   26 process 24169  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   25 process 24170  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   24 process 24171  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   23 process 24172  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   22 process 24173  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   21 process 24174  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   20 process 24175  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   19 process 24176  0x00002afcecfa412f in __write_nocancel ()
    from /lib/libpthread.so.0
   18 process 24177  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   17 process 24178  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
---Type <return> to continue, or q <return> to quit---
    from /lib/libpthread.so.0
   16 process 24179  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   15 process 24180  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   14 process 24181  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   13 process 24182  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   12 process 24183  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   11 process 24184  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   10 process 24185  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   9 process 24186  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   8 process 24187  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   7 process 24188  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   6 process 24189  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
   5 process 24190  0x00002afcecfa500f in waitpid () from 
/lib/libpthread.so.0
   4 process 24197  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
    from /lib/libpthread.so.0
   3 process 24199  0x00002afced2819ac in capset () from /lib/libc.so.6
   2 process 24200  0x00002afced22b90d in memmove () from /lib/libc.so.6
* 1 process 24198  gensweep_nstack (p=0x2aaaade37808, 
ptr_old_htop=0x44048b28,
     ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
(gdb) t 19
[Switching to thread 19 (process 24176)]#0  0x00002afcecfa412f in 
__write_nocancel () from /lib/libpthread.so.0
(gdb) where
#0  0x00002afcecfa412f in __write_nocancel () from /lib/libpthread.so.0
#1  0x00000000004f7757 in efile_writev (errInfo=0x7d9e24,
     flags=<value optimized out>, fd=34, iov=0x7c2098, iovcnt=1, 
size=229146)
     at drivers/unix/unix_efile.c:1109
#2  0x000000000051769a in invoke_writev (data=<value optimized out>)
     at drivers/common/efile_drv.c:1175
#3  0x00000000004be255 in async_main (arg=<value optimized out>)
     at beam/erl_async.c:242
#4  0x0000000000549f24 in thr_wrapper (vtwd=<value optimized out>)
     at common/ethread.c:474
#5  0x00002afcecf9ef1a in start_thread () from /lib/libpthread.so.0
#6  0x00002afced2815d2 in sysctl () from /lib/libc.so.6
#7  0x0000000000000000 in ?? ()


--
paul



More information about the erlang-bugs mailing list