[erlang-questions] Segfault in NIF call

Tue Apr 30 15:11:23 CEST 2013

Thx, I'll try Valgrind.

Each function returns only terms created in its given process env.

Each thread has two envs, one mutex-protected area for inter-thread-communication and one for sending.

The mutex env is used to create an in-queue and is written once and read once only. These terms are only written/read inside mutex-protection. Q: How are the terms cleaned up in this env? - I cannot use enif_clear_env as it would clear too much.

The send env is cleared after each enif_send, and only terms belonging to that env is used (made or copied there).

Side question: enif_self() was unusable for me (the embedded term had value -13 when printed with %d), the pid returned could not be sent to. So now I'm having the .erl provide self() which I do enif_get_local_pid() on, and this works fine. This is on R16B.

My latest version segfaults on R16B but works on R15B03-1.

/Fredrik

On 30 apr 2013, at 05:38, Sverker Eriksson <sverker.eriksson@REDACTED> wrote:

> You wrote:
> 
> "The NIF:ed function is called from a few different processes, and it forwards
> the call to a worker thread."
> 
> Have you done proper thread synchronization for this inter thread communication (with enif_mutex_lock() for example)? NIFs are only thread safe as long as they access their own environment. Shared memory structures such as a queue for a worker thread must be protected with locks (or implemented with a lock-free algorithm).
> 
> 
> Another approach is to see if valgrind can detect any memory corruptions:
> 
> # cd $ERL_TOP/erts/emulator
> # make TYPE=valgrind smp plain
> # export VALGRIND_LOG_DIR=/write/valgrind/logs/here
> # export VALGRIND_MISC_FLAGS="--suppressions=$ERL_TOP/erts/emulator/valgrind/suppress.standard --show-possibly-lost=no"
> # $ERL_TOP/bin/cerl -valgrind
> 
> /Sverker
> 
> 
> Fredrik Linder wrote:
>> Found one error (yay!) -- enif_open_resource_type should have NULL as the
>> module argument (missed this one in the docs)
>> 
>> Running with cerl does not generate a segfault, nor any (extra) printouts
>> Running with erl still generate segfault
>> 
>> Now I get either of the following (using erl), does it reveal what my error
>> is?
>> 
>>  
>>> USE_GDB=1 rebar eunit
>> [cut]
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x2aaaad6c0700 (LWP 6384)]
>> 0x000000000052b28a in process_main () at
>> x86_64-unknown-linux-gnu/opt/smp/beam_hot.h:979
>> 979    MoveDeallocateReturn(xb(tmp_packed1&BEAM_LOOSE_MASK), r(0),
>> Qb((tmp_packed1>>BEAM_LOOSE_SHIFT)));
>> (gdb) backtrace
>> #0  0x000000000052b28a in process_main () at
>> x86_64-unknown-linux-gnu/opt/smp/beam_hot.h:979
>> #1  0x0000000000491463 in sched_thread_func (vesdp=0x2aaaac343ac0) at
>> beam/erl_process.c:5632
>> #2  0x0000000000590440 in thr_wrapper (vtwd=0x7fffffffd850) at
>> pthread/ethread.c:106
>> #3  0x00002aaaab3ffe9a in start_thread (arg=0x2aaaad6c0700) at
>> pthread_create.c:308
>> #4  0x00002aaaab911cbd in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #5  0x0000000000000000 in ?? ()
>> 
>> or
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x2aaaad6c0700 (LWP 6914)]
>> unlink_free_block (allctr=0x85e5c0, block=0x0, flags=0) at
>> beam/erl_goodfit_alloc.c:458
>> 458    Uint sz = MBC_FBLK_SZ(&blk->block_head);
>> (gdb) backtrace
>> #0  unlink_free_block (allctr=0x85e5c0, block=0x0, flags=0) at
>> beam/erl_goodfit_alloc.c:458
>> #1  0x0000000000444ee1 in get_free_block (allctr=0x85e5c0, size=<optimized
>> out>, cand_blk=0x0, cand_size=0, flags=0) at beam/erl_goodfit_alloc.c:426
>> #2  0x000000000043b6da in mbc_alloc_block (alcu_flgsp=<synthetic pointer>,
>> blk_szp=<synthetic pointer>, size=<optimized out>, allctr=0x85e5c0) at
>> beam/erl_alloc_util.c:1309
>> #3  mbc_alloc (allctr=0x85e5c0, size=<optimized out>) at
>> beam/erl_alloc_util.c:1451
>> #4  0x0000000000440d7b in do_erts_alcu_alloc (size=32, extra=0x85e5c0,
>> type=148) at beam/erl_alloc_util.c:3530
>> #5  erts_alcu_alloc_thr_pref (type=148, extra=<optimized out>,
>> size=<optimized out>) at beam/erl_alloc_util.c:3607
>> #6  0x0000000000490595 in erts_alloc (size=32, type=18967) at
>> beam/erl_alloc.h:208
>> #7  new_message_buffer (size=0) at beam/erl_message.c:72
>> #8  erts_alloc_message_heap_state (statep=0x2aaaad6bfc4c,
>> receiver_locks=0x2aaaad6bfcb0, receiver=0x2aaaac9c1db8, ohpp=<synthetic
>> pointer>, bpp=<synthetic pointer>, size=0) at beam/global.h:1017
>> #9  erts_send_message (sender=0x2aaaac9c2470, receiver=0x2aaaac9c1db8,
>> receiver_locks=0x2aaaad6bfcb0, message=564171, flags=<optimized out>) at
>> beam/erl_message.c:1039
>> #10 0x0000000000476350 in do_send (p=0x2aaaac9c2470, to=793099, msg=564171,
>> suspend=1, refp=<optimized out>) at beam/bif.c:2025
>> #11 0x0000000000476bf0 in erl_send (p=0x2aaaac9c2470, to=793099,
>> msg=564171) at beam/bif.c:2138
>> #12 0x000000000052d1d8 in process_main () at beam/beam_emu.c:2558
>> #13 0x0000000000491463 in sched_thread_func (vesdp=0x2aaaac343ac0) at
>> beam/erl_process.c:5632
>> #14 0x0000000000590440 in thr_wrapper (vtwd=0x7fffffffd850) at
>> pthread/ethread.c:106
>> #15 0x00002aaaab3ffe9a in start_thread (arg=0x2aaaad6c0700) at
>> pthread_create.c:308
>> #16 0x00002aaaab911cbd in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #17 0x0000000000000000 in ?? ()
>> 
>> 
>> /Fredrik
>> 
>> 
>> 
>> 
>> 
>> 
>> On Mon, Apr 29, 2013 at 2:58 AM, Sverker Eriksson <
>> sverker.eriksson@REDACTED> wrote:
>> 
>>  
>>> Looks like some sort of memory corruption.
>>> 
>>> Run on debug emulator and hope for a better (earlier) crash.
>>> 
>>> # cd $ERL_TOP/erts/emulator
>>> # make TYPE=debug smp plain
>>> # $ERL_TOP/bin/cerl -debug
>>> 
>>> /Sverker, Erlang/OTP Ericsson
>>> 
>>> fredrik@REDACTED wrote:
>>> 
>>>    
>>>> Hello folks,
>>>> 
>>>> I'm having difficulties locating the cause of a segfault I'm getting when
>>>> running tests with a NIF implementation I have.
>>>> 
>>>> Anything that would shed light on what's wrong is appreciated.
>>>> 
>>>> The backtrace does not seem to have anything to do with my NIF. The
>>>> NIF:ed function is called from a few different processes, and it forwards
>>>> the call to a worker thread which sends a message back to the caller
>>>> process.
>>>> 
>>>> 
>>>> 
>>>>      
>>>>> strings -a
>>>> ../../otp/R16B/pre-5.10.1-mz-**0.2/lib/erlang/erts-5.10.1/**bin/beam.smp|fgrep
>>>> GCC|sort -u
>>>> GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
>>>> 
>>>> 
>>>> 
>>>>      
>>>>> USE_GDB=1 rebar eunit
>>>> [cut]
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> [Switching to Thread 0x2aaaef381700 (LWP 19162)]
>>>> unlink_free_block (allctr=0x85f240, block=0x0, flags=0) at
>>>> beam/erl_goodfit_alloc.c:458
>>>> 458    Uint sz = MBC_FBLK_SZ(&blk->block_head);
>>>> (gdb) backtrace
>>>> #0  unlink_free_block (allctr=0x85f240, block=0x0, flags=0) at
>>>> beam/erl_goodfit_alloc.c:458
>>>> #1  0x0000000000444ee1 in get_free_block (allctr=0x85f240,
>>>> size=<optimized out>, cand_blk=0x0, cand_size=0, flags=0) at
>>>> beam/erl_goodfit_alloc.c:426
>>>> #2  0x000000000043b6da in mbc_alloc_block (alcu_flgsp=<synthetic
>>>> pointer>, blk_szp=<synthetic pointer>, size=<optimized out>,
>>>> allctr=0x85f240) at beam/erl_alloc_util.c:1309
>>>> #3  mbc_alloc (allctr=0x85f240, size=<optimized out>) at
>>>> beam/erl_alloc_util.c:1451
>>>> #4  0x0000000000440d7b in do_erts_alcu_alloc (size=32, extra=0x85f240,
>>>> type=148) at beam/erl_alloc_util.c:3530
>>>> #5  erts_alcu_alloc_thr_pref (type=148, extra=<optimized out>,
>>>> size=<optimized out>) at beam/erl_alloc_util.c:3607
>>>> #6  0x0000000000490595 in erts_alloc (size=32, type=18967) at
>>>> beam/erl_alloc.h:208
>>>> #7  new_message_buffer (size=0) at beam/erl_message.c:72
>>>> #8  erts_alloc_message_heap_state (statep=0x2aaaef380c4c,
>>>> receiver_locks=0x2aaaef380cb0, receiver=0x2aaaac9ccac8, ohpp=<synthetic
>>>> pointer>, bpp=<synthetic pointer>, size=0) at beam/global.h:1017
>>>> #9  erts_send_message (sender=0x2aaaac9cd118, receiver=0x2aaaac9ccac8,
>>>> receiver_locks=0x2aaaef380cb0, message=513931, flags=<optimized out>) at
>>>> beam/erl_message.c:1039
>>>> #10 0x0000000000476350 in do_send (p=0x2aaaac9cd118, to=550027,
>>>> msg=513931, suspend=1, refp=<optimized out>) at beam/bif.c:2025
>>>> #11 0x0000000000476bf0 in erl_send (p=0x2aaaac9cd118, to=550027,
>>>> msg=513931) at beam/bif.c:2138
>>>> #12 0x000000000052d1d8 in process_main () at beam/beam_emu.c:2558
>>>> #13 0x0000000000491463 in sched_thread_func (vesdp=0x2aaaac34de40) at
>>>> beam/erl_process.c:5632
>>>> #14 0x0000000000590440 in thr_wrapper (vtwd=0x7fffffffd7e0) at
>>>> pthread/ethread.c:106
>>>> #15 0x00002aaaab3ffe9a in start_thread (arg=0x2aaaef381700) at
>>>> pthread_create.c:308
>>>> #16 0x00002aaaab911cbd in clone () at ../sysdeps/unix/sysv/linux/**
>>>> x86_64/clone.S:112
>>>> #17 0x0000000000000000 in ?? ()
>>>> (gdb)
>>>> 
>>>>  ------------------------------**------------------------------**
>>>> ------------
>>>> 
>>>> ______________________________**_________________
>>>> erlang-questions mailing list
>>>> erlang-questions@REDACTED
>>>> http://erlang.org/mailman/**listinfo/erlang-questions<http://erlang.org/mailman/listinfo/erlang-questions>
>