[erlang-questions] Strange bus error 7 with HiPE R12B-5 + patch

Paul Fisher pfisher@REDACTED
Tue Feb 24 13:37:20 CET 2009


I sent this to erlang-bugs last night and it has not shown up as of 
yet... Trying to erlang-questions in the hopes that someone can still 
look and give some guidance/assistance today.

thx

Paul Fisher wrote:
> We have been having strange core dumps happen occasionally in our 
> environment, most of which end up with a stack trace like the following:
> 
> (gdb) where
> #0  0x0000000045050b39 in ?? ()
> #1  0x00002aaaabe25922 in ?? ()
> #2  0x000000000000078f in ?? ()
> #3  0x0000000000000000 in ?? ()
> 
> This happens on thread 1... the one that ends up running 
> erts_sys_main_thread().  Pretty weird.
> 
> Today, i got a core dump on the same thread, while it was running 
> gensweep_nstack().  What follows is my brief trolling through the dump 
> and just following things until it was clear how we ended up at the 
> problem.
> 
> I would love if someone could help further the diagnosis beyond this 
> point, because I'm not sure where to look beyond this.  Without HiPE 
> compilation (which we do on most of our modules) these problems do not 
> occur, so it does seem to point to a HiPE related issue.
> 
> The environment is a 4 core, Core 2 Q6600, 4 G ECC memory with the 
> emulator running SMP and running 64-bit.
> 
> cluster-14:/var/alertlogic# uname -a
> Linux cluster-14 2.6.24-etchnhalf.1-amd64 #1 SMP Fri Dec 26 03:26:12 UTC 
> 2008 x86_64 GNU/Linux
> 
> Anyway, here is the gdb session:
> 
> Core was generated by `/usr/lib/erlang/erts-5.6.5/bin/beam.smp -Ktrue -W 
> w -A 32 -a 128 -d -- -root /u'.
> Program terminated with signal 7, Bus error.
> #0  gensweep_nstack (p=0x2aaaade37808, ptr_old_htop=0x44048b28,
>      ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
> 70	    if (likely(sdesc->bucket.hvalue == ra))
> (gdb) where
> #0  gensweep_nstack (p=0x2aaaade37808, ptr_old_htop=0x44048b28,
>      ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
> #1  0x00000000004bfc35 in minor_collection (p=0x2aaaade37808, need=2,
>      objv=0x0, nobj=0, recl=0x44048e68) at beam/erl_gc.c:893
> #2  0x00000000004c0761 in erts_garbage_collect (p=0x2aaaade37808, need=2,
>      objv=0x0, nobj=0) at beam/erl_gc.c:374
> #3  0x000000000050ae1f in hipe_gc (p=0x1b9b860, need=46912528116752)
>      at hipe/hipe_native_bif.c:69
> #4  0x000000000050be74 in nbif_gc_1 ()
>      at x86_64-unknown-linux-gnu/opt/smp/hipe_amd64_bifs.S:540
> #5  0x00002aaaade37808 in ?? ()
> #6  0x00002aaaade37a80 in ?? ()
> #7  0x0000000000000007 in ?? ()
> #8  0x00002aaaaaaed9c0 in ?? ()
> #9  0x00002aaaabe8f0c8 in ?? ()
> #10 0x00002aaaabe937d8 in ?? ()
> #11 0x00002aaaabe937d8 in ?? ()
> #12 0x0000000000509ce4 in hipe_mode_switch (p=0x2aaaade37808, 
> cmd=2895309840,
>      reg=0x2aaaaaaed9c0) at hipe/hipe_x86_glue.h:196
> #13 0x00000000004dd97b in process_main () at beam/beam_emu.c:4681
> #14 0x000000000048100f in sched_thread_func (vesdp=<value optimized out>)
>      at beam/erl_process.c:752
> #15 0x0000000000549f24 in thr_wrapper (vtwd=<value optimized out>)
> ---Type <return> to continue, or q <return> to quit---
>      at common/ethread.c:474
> #16 0x00002afcecf9ef1a in start_thread () from /lib/libpthread.so.0
> #17 0x00002afced2815d2 in sysctl () from /lib/libc.so.6
> #18 0x0000000000000000 in ?? ()
> (gdb) p ra
> $1 = 1159839139
> (gdb) p *sdesc
> Cannot access memory at address 0x7d337b25097d327b
> (gdb) list
> 65	
> 66	static __inline__ const struct sdesc *hipe_find_sdesc(unsigned long ra)
> 67	{
> 68	    unsigned int i = (ra >> HIPE_RA_LSR_COUNT) & hipe_sdesc_table.mask;
> 69	    const struct sdesc *sdesc = hipe_sdesc_table.bucket[i];
> 70	    if (likely(sdesc->bucket.hvalue == ra))
> 71		return sdesc;
> 72	    do {
> 73		sdesc = sdesc->bucket.next;
> 74	    } while (sdesc->bucket.hvalue != ra);
> (gdb) p i
> $2 = 1
> (gdb) p hipe_sdesc_table
> $3 = {
>    log2size = 16,
>    mask = 65535,
>    used = 26215,
>    bucket = 0x2aaaadc35010
> }
> (gdb) p hipe_sdesc_table.bucket[1]
> $4 = (struct sdesc *) 0x8f2c50
> (gdb) p *hipe_sdesc_table.bucket[1]
> $5 = {
>    bucket = {
>      hvalue = 1159004161,
>      next = 0x85ab30
>    },
>    summary = 2048,
>    livebits = {14}
> }
> (gdb) p *hipe_sdesc_table.bucket[1]->bucket.next
> $6 = {
>    bucket = {
>      hvalue = 1158348801,
>      next = 0x819410
>    },
>    summary = 1536,
>    livebits = {1}
> }
> (gdb) p *hipe_sdesc_table.bucket[1]->bucket.next->bucket.next
> $7 = {
>    bucket = {
>      hvalue = 1158283265,
>      next = 0x0
>    },
>    summary = 2048,
>    livebits = {0}
> }
> (gdb) p *hipe_sdesc_table.bucket[1]->bucket.next->bucket.next->bucket.next
> Cannot access memory at address 0x0
> (gdb) i threads
>    40 process 24151  0x00002afced27aa96 in getdomainname () from 
> /lib/libc.so.6
>    39 process 24153  0x00002afcecfa41bf in __read_nocancel ()
>     from /lib/libpthread.so.0
>    38 process 24156  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    37 process 24158  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    36 process 24159  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    35 process 24160  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    34 process 24161  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    33 process 24162  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    32 process 24163  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    31 process 24164  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    30 process 24165  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    29 process 24166  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
> ---Type <return> to continue, or q <return> to quit---
>    28 process 24167  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    27 process 24168  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    26 process 24169  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    25 process 24170  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    24 process 24171  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    23 process 24172  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    22 process 24173  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    21 process 24174  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    20 process 24175  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    19 process 24176  0x00002afcecfa412f in __write_nocancel ()
>     from /lib/libpthread.so.0
>    18 process 24177  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    17 process 24178  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
> ---Type <return> to continue, or q <return> to quit---
>     from /lib/libpthread.so.0
>    16 process 24179  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    15 process 24180  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    14 process 24181  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    13 process 24182  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    12 process 24183  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    11 process 24184  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    10 process 24185  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    9 process 24186  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    8 process 24187  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    7 process 24188  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    6 process 24189  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
> ---Type <return> to continue, or q <return> to quit---
>    5 process 24190  0x00002afcecfa500f in waitpid () from 
> /lib/libpthread.so.0
>    4 process 24197  0x00002afcecfa1b3a in pthread_cond_wait@@GLIBC_2.3.2 ()
>     from /lib/libpthread.so.0
>    3 process 24199  0x00002afced2819ac in capset () from /lib/libc.so.6
>    2 process 24200  0x00002afced22b90d in memmove () from /lib/libc.so.6
> * 1 process 24198  gensweep_nstack (p=0x2aaaade37808, 
> ptr_old_htop=0x44048b28,
>      ptr_n_htop=0x44048b20) at hipe/hipe_stack.h:70
> (gdb) t 19
> [Switching to thread 19 (process 24176)]#0  0x00002afcecfa412f in 
> __write_nocancel () from /lib/libpthread.so.0
> (gdb) where
> #0  0x00002afcecfa412f in __write_nocancel () from /lib/libpthread.so.0
> #1  0x00000000004f7757 in efile_writev (errInfo=0x7d9e24,
>      flags=<value optimized out>, fd=34, iov=0x7c2098, iovcnt=1, 
> size=229146)
>      at drivers/unix/unix_efile.c:1109
> #2  0x000000000051769a in invoke_writev (data=<value optimized out>)
>      at drivers/common/efile_drv.c:1175
> #3  0x00000000004be255 in async_main (arg=<value optimized out>)
>      at beam/erl_async.c:242
> #4  0x0000000000549f24 in thr_wrapper (vtwd=<value optimized out>)
>      at common/ethread.c:474
> #5  0x00002afcecf9ef1a in start_thread () from /lib/libpthread.so.0
> #6  0x00002afced2815d2 in sysctl () from /lib/libc.so.6
> #7  0x0000000000000000 in ?? ()
> 
> 
> --
> paul
> 




More information about the erlang-questions mailing list