[erlang-bugs] Segmentation Fault R13B01

Cliff Moon cliff@REDACTED
Fri Sep 4 20:52:03 CEST 2009


What's the status on this issue?  I've seen this behavior before as 
well.  It seems to be a concurrency issue with active TCP packet 
delivery.  You can pretty easily reproduce the issue using the janus app 
found here:

http://github.com/cliffmoon/janus/tree/master

You need to test on a linux machine with a lot of CPU's to see it happen 
with any frequency.  I've found that 8 cores or more is enough to see it 
happen after a few tries.  Using an EC2 high cpu XL instance seems to do 
the trick.  Basically just start up the server in one VM like this: 
`make run1` and startup the workers by doing `make sh` and issuing the 
erlang command bot:test(flashbot, 10000).

You have a pretty good chance of seeing one of the VM's segfault.  If 
not you need to restart the VM and start from scratch.  When I run this 
with gdb I get a similar backtrace to what was previously mentioned in 
this thread, in that it appears to be a problem in active tcp delivery.


------------------------------------------------------------------------------------------

    Hi,I build erlang using gcc 4.1.2 (the default for centos)
    I started erl using
     -env ERL_MAX_PORTS 110000 +K true +P 110000 +S4 -smp -detached

    You can download 3 core dumps
    http://94.75.214.130/core.12514.gz
    http://94.75.214.130/core.939.gz
    http://94.75.214.130/core.28223.gz

    Unfortunately, i have no clue which part of the code triggers the
    segfault, other than it happens constantly, and i
    can not redistribute the whole program. The program though uses
    heavily tcp
    connections, typically i have over
    10k established tcp connections.

    I would try to build the debug emulator tonight and let you know if
    i find
    something.

    Thanks,
    Georgos

    2009/7/1 Raimo Niskanen
    <raimo+erlang-bugs@REDACTED<raimo%2Berlang-bugs@REDACTED>
     >

     > On Wed, Jul 01, 2009 at 05:25:14PM +0200, Georgos Siganos wrote:
     > > Hi All,I am having problems with R13B01 and segmentation
    faults, as the
     > > following one (in the bottom).
     > > Unfortunately, i am not sure which part of the code triggers the
     > > segmentation fault.
     > >
     > > I am running Centos 5.3 ( 2.6.18-128.1.16.el5 #1 SMP x86_64 )
    on a quad
     > core
     > > intel processor.
     > > The program quits with segfault both when compiled with and
    without hipe.
     > >
     > > Please let me know if there is anything else i can report to
    fix this
     > > problem. This
     > > segmentation
     > > fault is quite consistent and is a show stopper for my code.
     > > Thanks,
     > > Georgos
     >
     > How did you build the Erlang emulator, how did you start it
     > (arguments), how did you provoke the segfault?
     >
     > Can you post the code that provokes this to see
     > if it is reproducable on other OS:es?
     >
     > Can you post the core dump for the Erlang/OTP team to dissect?
     >
     > Can you build and run a debug emulator and see if you get an earlier
     > fault detection? (gmake smp TYPE=debug in the emulator directory)
     >
     > >
     > >
     > > ----------------------- gdb output --------------------------
     > > Program terminated with signal 11, Segmentation fault.
     > > [New process 12531]
     > > [New process 12533]
     > > [New process 12532]
     > > [New process 12530]
     > > [New process 12526]
     > > [New process 12517]
     > > [New process 12516]
     > > [New process 12514]
     > > #0  0x00002add1f7b570b in memcpy () from /lib64/libc.so.6
     > > (gdb) bt
     > > #0  0x00002add1f7b570b in memcpy () from /lib64/libc.so.6
     > > #1  0x0000000000486849 in driver_deliver_term (port=<value
    optimized
     > out>,
     > > to=4816451, data=<value optimized out>,
     > >     len=<value optimized out>) at beam/io.c:2994
     > > #2  0x00000000005513cf in tcp_deliver (desc=0x2aab17c17548,
    len=3) at
     > > drivers/common/inet_drv.c:2980
     > > #3  0x0000000000551891 in tcp_recv (desc=0x2aab17c17548,
    request_len=0)
     > at
     > > drivers/common/inet_drv.c:8043
     > > #4  0x0000000000551afc in tcp_inet_drv_input (data=0x2aaae21a9fc4,
     > > event=<value optimized out>) at drivers/common/inet_drv.c:8381
     > > #5  0x00000000004a3d78 in erts_port_task_execute
    (runq=0x2add1fc19340,
     > > curr_port_pp=0x2aaaaaacb1e8) at beam/erl_port_task.c:853
     > > #6  0x000000000049ebc5 in schedule (p=0x349, calls=<value
    optimized out>)
     > at
     > > beam/erl_process.c:6116
     > > #7  0x0000000000505afd in process_main () at beam/beam_emu.c:1126
     > > #8  0x0000000000499126 in sched_thread_func (vesdp=<value
    optimized out>)
     > at
     > > beam/erl_process.c:3015
     > > #9  0x000000000057a0f4 in thr_wrapper (vtwd=<value optimized
    out>) at
     > > common/ethread.c:475
     > > #10 0x00002add1f31b367 in start_thread () from
    /lib64/libpthread.so.0
     > > #11 0x00002add1f80cf7d in clone () from /lib64/libc.so.6
     > >
     >
    ---------------------------------------------------------------------------
     >
     > --
     >
     > / Raimo Niskanen, Erlang/OTP, Ericsson AB
     >




More information about the erlang-bugs mailing list