R11B-0 SMP segfaults in unlink_free_block

Richard Cameron camster@REDACTED
Sun Jul 30 12:26:26 CEST 2006


Hi,

I'm seeing fairly regular segmentation faults on R11B-0 on a 2-CPU 64- 
bit Linux box. It's compiled from source with no special options  
(other than ./configure --prefix=/opt/erlang). I got beam.smp to dump  
core, and I've attached a strack trace from gdb below.

erts would have been compiled with gcc -O3, so there's a vague  
possibility that the stack trace is slightly bogus. However, it seems  
to go wrong only in SMP mode, and the the offending section is called  
from somewhere in time.c with the rather frightening looking comment:

         /* Here comes hairy use of the timer fields!
          * They are reset without having the lock.
          * It is assumed that no code but this will
          * accesses any field until the ->timeout
          * callback is called.
          */
         p->next = NULL;
         p->slot = 0;
         (*p->timeout)(p->arg);

The application probably has several thousand erlang processes  
spawned, most of which are performing this sort of hybrid poll/ 
receive pattern:

loop() ->
	receive
		event ->
			handle_event(),
	after Timeout ->
			poll_external_system()
	end,
	loop().

Is it possible my code's picking out an obscure race condition in the  
new SMP code?


---

(smithers)lisa:~% gdb /opt/erlang/lib/erlang/erts-5.5/bin/beam.smp  
core.12735
GNU gdb Red Hat Linux (6.3.0.0-1.96rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for  
details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host  
libthread_db library "/lib64/tls/libthread_db.so.1".

Core was generated by `/opt/erlang/lib/erlang/erts-5.5/bin/beam.smp  
-- -root /opt/erlang/lib/erlang -p'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib64/libdl.so.2...done.

[...]

Loaded symbols for /usr/lib64/libz.so.1
#0  0x000000000043729e in unlink_free_block (allctr=0x677a00,  
block=0x6e4a38)
     at beam/erl_goodfit_alloc.c:452
452             blk->prev->next = blk->next;
(gdb) bt
#0  0x000000000043729e in unlink_free_block (allctr=0x677a00,  
block=0x6e4a38)
     at beam/erl_goodfit_alloc.c:452
#1  0x000000000043305b in mbc_free (allctr=0x677a00, p=Variable "p"  
is not available.
)
     at beam/erl_alloc_util.c:731
#2  0x0000000000436555 in erts_alcu_free_ts (type=Variable "type" is  
not available.
)
     at beam/erl_alloc_util.c:2221
#3  0x000000000047cafe in timer_thread_start (ignore=Variable  
"ignore" is not available.
) at beam/time.c:292
#4  0x000000000050bea8 in thr_wrapper (vtwd=Variable "vtwd" is not  
available.
) at common/ethread.c:440
#5  0x00000039a100610a in start_thread () from /lib64/tls/ 
libpthread.so.0
#6  0x000000399fdc5ee3 in clone () from /lib64/tls/libc.so.6
#7  0x0000000000000000 in ?? ()




More information about the erlang-bugs mailing list