[erlang-questions] Strange bus error 7 with HiPE R12B-5 + patch
Paul Fisher
pfisher@REDACTED
Tue Feb 24 13:37:20 CET 2009
I sent this to erlang-bugs last night and it has not shown up as of
yet... Trying to erlang-questions in the hopes that someone can still
look and give some guidance/assistance today.
thx
Paul Fisher wrote:
> We have been having strange core dumps happen occasionally in our
> environment, most of which end up with a stack trace like the following:
>
> (gdb) where
> #0 0x0000000045050b39 in ?? ()
> #1 0x00002aaaabe25922 in ?? ()
> #2 0x000000000000078f in ?? ()
> #3 0x0000000000000000 in ?? ()
>
> This happens on thread 1... the one that ends up running
> erts_sys_main_thread(). Pretty weird.
>
> Today, i got a core dump on the same thread, while it was running
> gensweep_nstack(). What follows is my brief trolling through the dump
> and just following things until it was clear how we ended up at the
> problem.
>
> I would love if someone could help further the diagnosis beyond this
> point, because I'm not sure where to look beyond this. Without HiPE
> compilation (which we do on most of our modules) these problems do not
> occur, so it does seem to point to a HiPE related issue.
>
> The environment is a 4 core, Core 2 Q6600, 4 G ECC memory with the
> emulator running SMP and running 64-bit.
>
> cluster-14:/var/alertlogic# uname -a
> Linux cluster-14 2.6.24-etchnhalf.1-amd64 #1 SMP Fri Dec 26 03:26:12 UTC
> 2008 x86_64 GNU/Linux
>
> Anyway, here is the gdb session:
>
> Core was generated by `/usr/lib/erlang/erts-5.6.5/bin/beam.smp -Ktrue -W
> w -A 32 -a 128 -d -- -root /u'.
> Program terminated with signal 7, Bus error.
> #0 gensweep_nstack (p=0x2aaaade37808, ptr_old_htop=0x44048b28,
> ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
> 70 if (likely(sdesc->bucket.hvalue == ra))
> (gdb) where
> #0 gensweep_nstack (p=0x2aaaade37808, ptr_old_htop=0x44048b28,
> ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
> #1 0x00000000004bfc35 in minor_collection (p=0x2aaaade37808, need=2,
> objv=0x0, nobj=0, recl=0x44048e68) at beam/erl_gc.c:893
> #2 0x00000000004c0761 in erts_garbage_collect (p=0x2aaaade37808, need=2,
> objv=0x0, nobj=0) at beam/erl_gc.c:374
> #3 0x000000000050ae1f in hipe_gc (p=0x1b9b860, need=46912528116752)
> at hipe/hipe_native_bif.c:69
> #4 0x000000000050be74 in nbif_gc_1 ()
> at x86_64-unknown-linux-gnu/opt/smp/hipe_amd64_bifs.S:540
> #5 0x00002aaaade37808 in ?? ()
> #6 0x00002aaaade37a80 in ?? ()
> #7 0x0000000000000007 in ?? ()
> #8 0x00002aaaaaaed9c0 in ?? ()
> #9 0x00002aaaabe8f0c8 in ?? ()
> #10 0x00002aaaabe937d8 in ?? ()
> #11 0x00002aaaabe937d8 in ?? ()
> #12 0x0000000000509ce4 in hipe_mode_switch (p=0x2aaaade37808,
> cmd=2895309840,
> reg=0x2aaaaaaed9c0) at hipe/hipe_x86_glue.h:196
> #13 0x00000000004dd97b in process_main () at beam/beam_emu.c:4681
> #14 0x000000000048100f in sched_thread_func (vesdp=<value optimized out>)
> at beam/erl_process.c:752
> #15 0x0000000000549f24 in thr_wrapper (vtwd=<value optimized out>)
> ---Type <return> to continue, or q <return> to quit---
> at common/ethread.c:474
> #16 0x00002afcecf9ef1a in start_thread () from /lib/libpthread.so.0
> #17 0x00002afced2815d2 in sysctl () from /lib/libc.so.6
> #18 0x0000000000000000 in ?? ()
> (gdb) p ra
> $1 = 1159839139
> (gdb) p *sdesc
> Cannot access memory at address 0x7d337b25097d327b
> (gdb) list
> 65
> 66 static __inline__ const struct sdesc *hipe_find_sdesc(unsigned long ra)
> 67 {
> 68 unsigned int i = (ra >> HIPE_RA_LSR_COUNT) & hipe_sdesc_table.mask;
> 69 const struct sdesc *sdesc = hipe_sdesc_table.bucket[i];
> 70 if (likely(sdesc->bucket.hvalue == ra))
> 71 return sdesc;
> 72 do {
> 73 sdesc = sdesc->bucket.next;
> 74 } while (sdesc->bucket.hvalue != ra);
> (gdb) p i
> $2 = 1
> (gdb) p hipe_sdesc_table
> $3 = {
> log2size = 16,
> mask = 65535,
> used = 26215,
> bucket = 0x2aaaadc35010
> }
> (gdb) p hipe_sdesc_table.bucket[1]
> $4 = (struct sdesc *) 0x8f2c50
> (gdb) p *hipe_sdesc_table.bucket[1]
> $5 = {
> bucket = {
> hvalue = 1159004161,
> next = 0x85ab30
> },
> summary = 2048,
> livebits = {14}
> }
> (gdb) p *hipe_sdesc_table.bucket[1]->bucket.next
> $6 = {
> bucket = {
> hvalue = 1158348801,
> next = 0x819410
> },
> summary = 1536,
> livebits = {1}
> }
> (gdb) p *hipe_sdesc_table.bucket[1]->bucket.next->bucket.next
> $7 = {
> bucket = {
> hvalue = 1158283265,
> next = 0x0
> },
> summary = 2048,
> livebits = {0}
> }
> (gdb) p *hipe_sdesc_table.bucket[1]->bucket.next->bucket.next->bucket.next
> Cannot access memory at address 0x0
> (gdb) i threads
> 40 process 24151 0x00002afced27aa96 in getdomainname () from
> /lib/libc.so.6
> 39 process 24153 0x00002afcecfa41bf in __read_nocancel ()
> from /lib/libpthread.so.0
> 38 process 24156 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 37 process 24158 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 36 process 24159 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 35 process 24160 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 34 process 24161 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 33 process 24162 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 32 process 24163 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 31 process 24164 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 30 process 24165 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 29 process 24166 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> ---Type <return> to continue, or q <return> to quit---
> 28 process 24167 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 27 process 24168 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 26 process 24169 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 25 process 24170 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 24 process 24171 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 23 process 24172 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 22 process 24173 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 21 process 24174 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 20 process 24175 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 19 process 24176 0x00002afcecfa412f in __write_nocancel ()
> from /lib/libpthread.so.0
> 18 process 24177 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 17 process 24178 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> ---Type <return> to continue, or q <return> to quit---
> from /lib/libpthread.so.0
> 16 process 24179 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 15 process 24180 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 14 process 24181 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 13 process 24182 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 12 process 24183 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 11 process 24184 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 10 process 24185 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 9 process 24186 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 8 process 24187 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 7 process 24188 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 6 process 24189 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> ---Type <return> to continue, or q <return> to quit---
> 5 process 24190 0x00002afcecfa500f in waitpid () from
> /lib/libpthread.so.0
> 4 process 24197 0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> from /lib/libpthread.so.0
> 3 process 24199 0x00002afced2819ac in capset () from /lib/libc.so.6
> 2 process 24200 0x00002afced22b90d in memmove () from /lib/libc.so.6
> * 1 process 24198 gensweep_nstack (p=0x2aaaade37808,
> ptr_old_htop=0x44048b28,
> ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
> (gdb) t 19
> [Switching to thread 19 (process 24176)]#0 0x00002afcecfa412f in
> __write_nocancel () from /lib/libpthread.so.0
> (gdb) where
> #0 0x00002afcecfa412f in __write_nocancel () from /lib/libpthread.so.0
> #1 0x00000000004f7757 in efile_writev (errInfo=0x7d9e24,
> flags=<value optimized out>, fd=34, iov=0x7c2098, iovcnt=1,
> size=229146)
> at drivers/unix/unix_efile.c:1109
> #2 0x000000000051769a in invoke_writev (data=<value optimized out>)
> at drivers/common/efile_drv.c:1175
> #3 0x00000000004be255 in async_main (arg=<value optimized out>)
> at beam/erl_async.c:242
> #4 0x0000000000549f24 in thr_wrapper (vtwd=<value optimized out>)
> at common/ethread.c:474
> #5 0x00002afcecf9ef1a in start_thread () from /lib/libpthread.so.0
> #6 0x00002afced2815d2 in sysctl () from /lib/libc.so.6
> #7 0x0000000000000000 in ?? ()
>
>
> --
> paul
>
More information about the erlang-questions
mailing list