[erlang-questions] dirty scheduler segfault

Michael Truog mjtruog@REDACTED
Sat Nov 1 03:31:30 CET 2014


Sorry, based on the Erlang shell it seems to be running with SMP enabled, so that shouldn't be it.

On 10/31/2014 07:27 PM, Michael Truog wrote:
> If "erl -smp disable" is being used I assume it is the problem at https://github.com/erlang/otp/pull/518
>
> On 10/31/2014 06:57 PM, Daniel Goertzen wrote:
>> Thanks for trying it out.  That gist was a bit of a hash; apologies.
>>
>> I made all the functions static and also put load and unload as NULL in ERL_NIF_INIT, but I get the same results.
>>
>> I ran it under valgrind and got...
>>
>>
>> # ERL_LIBS=.. valgrind --trace-children=yes erl
>>
>> ...
>>
>> Eshell V6.2  (abort with ^G)
>>
>> 1>
>>
>> 1> dlibusb:mytest_io().
>>
>> ==9029== Thread 18:
>>
>> ==9029== Invalid read of size 4
>>
>> ==9029==    at 0x8190B56: process_main (beam_hot.h:935)
>>
>> ==9029==    by 0x80E565E: sched_thread_func (erl_process.c:7719)
>>
>> ==9029==    by 0x820982B: thr_wrapper (ethread.c:106)
>>
>> ==9029==    by 0x40FFF46: start_thread (in /lib/libpthread-2.20.so <http://libpthread-2.20.so>)
>>
>> ==9029==    by 0x41FE97D: clone (in /lib/libc-2.20.so <http://libc-2.20.so>)
>>
>> ==9029==  Address 0xfffffffe is not stack'd, malloc'd or (recently) free'd
>>
>> ==9029==
>>
>> ==9029==
>>
>> ==9029== Process terminating with default action of signal 11 (SIGSEGV)
>>
>> ==9029==  Access not within mapped region at address 0xFFFFFFFE
>>
>> ==9029==    at 0x8190B56: process_main (beam_hot.h:935)
>>
>> ==9029==    by 0x80E565E: sched_thread_func (erl_process.c:7719)
>>
>> ==9029==    by 0x820982B: thr_wrapper (ethread.c:106)
>>
>> ==9029==    by 0x40FFF46: start_thread (in /lib/libpthread-2.20.so <http://libpthread-2.20.so>)
>>
>> ==9029==    by 0x41FE97D: clone (in /lib/libc-2.20.so <http://libc-2.20.so>)
>>
>> ==9029==  If you believe this happened as a result of a stack
>>
>> ==9029==  overflow in your program's main thread (unlikely but
>>
>> ==9029==  possible), you can try to increase the size of the
>>
>> ==9029==  main thread stack using the --main-stacksize= flag.
>>
>> ==9029==  The main thread stack size used in this run was 8388608.
>>
>> ==9029==
>>
>> ==9029== HEAP SUMMARY:
>>
>> ==9029==     in use at exit: 9,020,474 bytes in 157 blocks
>>
>> ==9029==   total heap usage: 211 allocs, 54 frees, 9,490,700 bytes allocated
>>
>> ==9029==
>>
>> ==9029== LEAK SUMMARY:
>>
>> ==9029==    definitely lost: 0 bytes in 0 blocks
>>
>> ==9029==    indirectly lost: 0 bytes in 0 blocks
>>
>> ==9029==      possibly lost: 14,143 bytes in 41 blocks
>>
>> ==9029==    still reachable: 9,006,331 bytes in 116 blocks
>>
>> ==9029==         suppressed: 0 bytes in 0 blocks
>>
>> ==9029== Rerun with --leak-check=full to see details of leaked memory
>>
>> ==9029==
>>
>> ==9029== For counts of detected and suppressed errors, rerun with: -v
>>
>> ==9029== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
>>
>> Killed
>>
>>
>>
>> Line 935 of my beam_hot.h is...
>>
>> OpCase(is_integer_fx): { BeamInstr* next; PreFetch(2, next); IsInteger(xb(Arg(1)), ClauseFail()); // line 935 NextPF(2, next); }
>>
>>
>> I know little about beam internals. I don't know if this is useful.
>>
>>
>> On Fri, Oct 31, 2014 at 4:05 PM, Steve Vinoski <vinoski@REDACTED <mailto:vinoski@REDACTED>> wrote:
>>
>>
>>
>>     On Fri, Oct 31, 2014 at 4:33 PM, Daniel Goertzen <daniel.goertzen@REDACTED <mailto:daniel.goertzen@REDACTED>> wrote:
>>
>>         I am seeing a segfault that seems to be related to dirty schedulers. I've reduced the fault to the erlang and C nif module below which executes the same nif with either the io dirty scheduler, the cpu dirty scheduler, or the normal erlang scheduler.
>>
>>
>>         When I start the emulator and run either dirty nif, I get a segfault. ( see https://gist.github.com/goertzenator/6237e0200a5f7bf22976)
>>
>>
>>     I found it hard to make sense of what's in that gist due to the formatting, so I took your code and built it myself. When I ran it, it failed in your NIF load function, but it failed in a way that didn't make sense because all your function does is return 0. Then I realized none of your C functions were declared static, which means they are global, and I suspected your load() function was clashing with some other function of the same name. I made all your C functions static, rebuilt, and then ran everything and it seems like it worked:
>>
>>     > c(dlibusb).
>>     Reading symbols for shared libraries . done
>>     {ok,dlibusb}
>>     2> dlibusb:mytest_cpu().
>>     [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
>>     3> dlibusb:mytest_io().
>>     [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
>>     4> dlibusb:mytest_none().
>>     [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
>>
>>     --steve
>>
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141031/cf16c050/attachment.htm>


More information about the erlang-questions mailing list