[erlang-bugs] Strange bus error 7 with HiPE R12B-5 + patch
Paul Fisher
pfisher@REDACTED
Tue Feb 24 02:56:22 CET 2009
We have been having strange core dumps happen occasionally in our
environment, most of which end up with a stack trace like the following:
(gdb) where
#0 0x0000000045050b39 in ?? ()
#1 0x00002aaaabe25922 in ?? ()
#2 0x000000000000078f in ?? ()
#3 0x0000000000000000 in ?? ()
This happens on thread 1... the one that ends up running
erts_sys_main_thread(). Pretty weird.
Today, i got a core dump on the same thread, while it was running
gensweep_nstack(). What follows is my brief trolling through the dump
and just following things until it was clear how we ended up at the
problem.
I would love if someone could help further the diagnosis beyond this
point, because I'm not sure where to look beyond this. Without HiPE
compilation (which we do on most of our modules) these problems do not
occur, so it does seem to point to a HiPE related issue.
The environment is a 4 core, Core 2 Q6600, 4 G ECC memory with the
emulator running SMP and running 64-bit.
cluster-14:/var/alertlogic# uname -a
Linux cluster-14 2.6.24-etchnhalf.1-amd64 #1 SMP Fri Dec 26 03:26:12 UTC
2008 x86_64 GNU/Linux
Anyway, here is the gdb session:
Core was generated by `/usr/lib/erlang/erts-5.6.5/bin/beam.smp -Ktrue -W
w -A 32 -a 128 -d -- -root /u'.
Program terminated with signal 7, Bus error.
#0 gensweep_nstack (p=0x2aaaade37808, ptr_old_htop=0x44048b28,
ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
70 if (likely(sdesc->bucket.hvalue == ra))
(gdb) where
#0 gensweep_nstack (p=0x2aaaade37808, ptr_old_htop=0x44048b28,
ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
#1 0x00000000004bfc35 in minor_collection (p=0x2aaaade37808, need=2,
objv=0x0, nobj=0, recl=0x44048e68) at beam/erl_gc.c:893
#2 0x00000000004c0761 in erts_garbage_collect (p=0x2aaaade37808, need=2,
objv=0x0, nobj=0) at beam/erl_gc.c:374
#3 0x000000000050ae1f in hipe_gc (p=0x1b9b860, need=46912528116752)
at hipe/hipe_native_bif.c:69
#4 0x000000000050be74 in nbif_gc_1 ()
at x86_64-unknown-linux-gnu/opt/smp/hipe_amd64_bifs.S:540
#5 0x00002aaaade37808 in ?? ()
#6 0x00002aaaade37a80 in ?? ()
#7 0x0000000000000007 in ?? ()
#8 0x00002aaaaaaed9c0 in ?? ()
#9 0x00002aaaabe8f0c8 in ?? ()
#10 0x00002aaaabe937d8 in ?? ()
#11 0x00002aaaabe937d8 in ?? ()
#12 0x0000000000509ce4 in hipe_mode_switch (p=0x2aaaade37808,
cmd=2895309840,
reg=0x2aaaaaaed9c0) at hipe/hipe_x86_glue.h:196
#13 0x00000000004dd97b in process_main () at beam/beam_emu.c:4681
#14 0x000000000048100f in sched_thread_func (vesdp=<value optimized out>)
at beam/erl_process.c:752
#15 0x0000000000549f24 in thr_wrapper (vtwd=<value optimized out>)
---Type <return> to continue, or q <return> to quit---
at common/ethread.c:474
#16 0x00002afcecf9ef1a in start_thread () from /lib/libpthread.so.0
#17 0x00002afced2815d2 in sysctl () from /lib/libc.so.6
#18 0x0000000000000000 in ?? ()
(gdb) p ra
$1 = 1159839139
(gdb) p *sdesc
Cannot access memory at address 0x7d337b25097d327b
(gdb) list
65
66 static __inline__ const struct sdesc *hipe_find_sdesc(unsigned long ra)
67 {
68 unsigned int i = (ra >> HIPE_RA_LSR_COUNT) & hipe_sdesc_table.mask;
69 const struct sdesc *sdesc = hipe_sdesc_table.bucket[i];
70 if (likely(sdesc->bucket.hvalue == ra))
71 return sdesc;
72 do {
73 sdesc = sdesc->bucket.next;
74 } while (sdesc->bucket.hvalue != ra);
(gdb) p i
$2 = 1
(gdb) p hipe_sdesc_table
$3 = {
log2size = 16,
mask = 65535,
used = 26215,
bucket = 0x2aaaadc35010
}
(gdb) p hipe_sdesc_table.bucket[1]
$4 = (struct sdesc *) 0x8f2c50
(gdb) p *hipe_sdesc_table.bucket[1]
$5 = {
bucket = {
hvalue = 1159004161,
next = 0x85ab30
},
summary = 2048,
livebits = {14}
}
(gdb) p *hipe_sdesc_table.bucket[1]->bucket.next
$6 = {
bucket = {
hvalue = 1158348801,
next = 0x819410
},
summary = 1536,
livebits = {1}
}
(gdb) p *hipe_sdesc_table.bucket[1]->bucket.next->bucket.next
$7 = {
bucket = {
hvalue = 1158283265,
next = 0x0
},
summary = 2048,
livebits = {0}
}
(gdb) p *hipe_sdesc_table.bucket[1]->bucket.next->bucket.next->bucket.next
Cannot access memory at address 0x0
(gdb) i threads
40 process 24151 0x00002afced27aa96 in getdomainname () from
/lib/libc.so.6
39 process 24153 0x00002afcecfa41bf in __read_nocancel ()
from /lib/libpthread.so.0
38 process 24156 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
37 process 24158 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
36 process 24159 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
35 process 24160 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
34 process 24161 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
33 process 24162 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
32 process 24163 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
31 process 24164 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
30 process 24165 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
29 process 24166 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
28 process 24167 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
27 process 24168 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
26 process 24169 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
25 process 24170 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
24 process 24171 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
23 process 24172 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
22 process 24173 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
21 process 24174 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
20 process 24175 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
19 process 24176 0x00002afcecfa412f in __write_nocancel ()
from /lib/libpthread.so.0
18 process 24177 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
17 process 24178 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
---Type <return> to continue, or q <return> to quit---
from /lib/libpthread.so.0
16 process 24179 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
15 process 24180 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
14 process 24181 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
13 process 24182 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
12 process 24183 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
11 process 24184 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
10 process 24185 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
9 process 24186 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
8 process 24187 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
7 process 24188 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
6 process 24189 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
5 process 24190 0x00002afcecfa500f in waitpid () from
/lib/libpthread.so.0
4 process 24197 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
3 process 24199 0x00002afced2819ac in capset () from /lib/libc.so.6
2 process 24200 0x00002afced22b90d in memmove () from /lib/libc.so.6
* 1 process 24198 gensweep_nstack (p=0x2aaaade37808,
ptr_old_htop=0x44048b28,
ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
(gdb) t 19
[Switching to thread 19 (process 24176)]#0 0x00002afcecfa412f in
__write_nocancel () from /lib/libpthread.so.0
(gdb) where
#0 0x00002afcecfa412f in __write_nocancel () from /lib/libpthread.so.0
#1 0x00000000004f7757 in efile_writev (errInfo=0x7d9e24,
flags=<value optimized out>, fd=34, iov=0x7c2098, iovcnt=1,
size=229146)
at drivers/unix/unix_efile.c:1109
#2 0x000000000051769a in invoke_writev (data=<value optimized out>)
at drivers/common/efile_drv.c:1175
#3 0x00000000004be255 in async_main (arg=<value optimized out>)
at beam/erl_async.c:242
#4 0x0000000000549f24 in thr_wrapper (vtwd=<value optimized out>)
at common/ethread.c:474
#5 0x00002afcecf9ef1a in start_thread () from /lib/libpthread.so.0
#6 0x00002afced2815d2 in sysctl () from /lib/libc.so.6
#7 0x0000000000000000 in ?? ()
--
paul
More information about the erlang-bugs
mailing list