[erlang-questions] dirty scheduler segfault

Michael Truog mjtruog@REDACTED
Sat Nov 1 03:27:11 CET 2014


If "erl -smp disable" is being used I assume it is the problem at https://github.com/erlang/otp/pull/518

On 10/31/2014 06:57 PM, Daniel Goertzen wrote:
> Thanks for trying it out.  That gist was a bit of a hash; apologies.
>
> I made all the functions static and also put load and unload as NULL in ERL_NIF_INIT, but I get the same results.
>
> I ran it under valgrind and got...
>
>
> # ERL_LIBS=.. valgrind --trace-children=yes erl
>
> ...
>
> Eshell V6.2  (abort with ^G)
>
> 1>
>
> 1> dlibusb:mytest_io().
>
> ==9029== Thread 18:
>
> ==9029== Invalid read of size 4
>
> ==9029==    at 0x8190B56: process_main (beam_hot.h:935)
>
> ==9029==    by 0x80E565E: sched_thread_func (erl_process.c:7719)
>
> ==9029==    by 0x820982B: thr_wrapper (ethread.c:106)
>
> ==9029==    by 0x40FFF46: start_thread (in /lib/libpthread-2.20.so <http://libpthread-2.20.so>)
>
> ==9029==    by 0x41FE97D: clone (in /lib/libc-2.20.so <http://libc-2.20.so>)
>
> ==9029==  Address 0xfffffffe is not stack'd, malloc'd or (recently) free'd
>
> ==9029==
>
> ==9029==
>
> ==9029== Process terminating with default action of signal 11 (SIGSEGV)
>
> ==9029==  Access not within mapped region at address 0xFFFFFFFE
>
> ==9029==    at 0x8190B56: process_main (beam_hot.h:935)
>
> ==9029==    by 0x80E565E: sched_thread_func (erl_process.c:7719)
>
> ==9029==    by 0x820982B: thr_wrapper (ethread.c:106)
>
> ==9029==    by 0x40FFF46: start_thread (in /lib/libpthread-2.20.so <http://libpthread-2.20.so>)
>
> ==9029==    by 0x41FE97D: clone (in /lib/libc-2.20.so <http://libc-2.20.so>)
>
> ==9029==  If you believe this happened as a result of a stack
>
> ==9029==  overflow in your program's main thread (unlikely but
>
> ==9029==  possible), you can try to increase the size of the
>
> ==9029==  main thread stack using the --main-stacksize= flag.
>
> ==9029==  The main thread stack size used in this run was 8388608.
>
> ==9029==
>
> ==9029== HEAP SUMMARY:
>
> ==9029==     in use at exit: 9,020,474 bytes in 157 blocks
>
> ==9029==   total heap usage: 211 allocs, 54 frees, 9,490,700 bytes allocated
>
> ==9029==
>
> ==9029== LEAK SUMMARY:
>
> ==9029==    definitely lost: 0 bytes in 0 blocks
>
> ==9029==    indirectly lost: 0 bytes in 0 blocks
>
> ==9029==      possibly lost: 14,143 bytes in 41 blocks
>
> ==9029==    still reachable: 9,006,331 bytes in 116 blocks
>
> ==9029==         suppressed: 0 bytes in 0 blocks
>
> ==9029== Rerun with --leak-check=full to see details of leaked memory
>
> ==9029==
>
> ==9029== For counts of detected and suppressed errors, rerun with: -v
>
> ==9029== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
>
> Killed
>
>
>
> Line 935 of my beam_hot.h is...
>
> OpCase(is_integer_fx): { BeamInstr* next; PreFetch(2, next); IsInteger(xb(Arg(1)), ClauseFail()); // line 935 NextPF(2, next); }
>
>
> I know little about beam internals. I don't know if this is useful.
>
>
> On Fri, Oct 31, 2014 at 4:05 PM, Steve Vinoski <vinoski@REDACTED <mailto:vinoski@REDACTED>> wrote:
>
>
>
>     On Fri, Oct 31, 2014 at 4:33 PM, Daniel Goertzen <daniel.goertzen@REDACTED <mailto:daniel.goertzen@REDACTED>> wrote:
>
>         I am seeing a segfault that seems to be related to dirty schedulers.  I've reduced the fault to the erlang and C nif module below which executes the same nif with either the io dirty scheduler, the cpu dirty scheduler, or the normal erlang scheduler.
>
>
>         When I start the emulator and run either dirty nif, I get a segfault. ( see https://gist.github.com/goertzenator/6237e0200a5f7bf22976)
>
>
>     I found it hard to make sense of what's in that gist due to the formatting, so I took your code and built it myself. When I ran it, it failed in your NIF load function, but it failed in a way that didn't make sense because all your function does is return 0. Then I realized none of your C functions were declared static, which means they are global, and I suspected your load() function was clashing with some other function of the same name. I made all your C functions static, rebuilt, and then ran everything and it seems like it worked:
>
>     > c(dlibusb).
>     Reading symbols for shared libraries . done
>     {ok,dlibusb}
>     2> dlibusb:mytest_cpu().
>     [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
>     3> dlibusb:mytest_io().
>     [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
>     4> dlibusb:mytest_none().
>     [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
>
>     --steve
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141031/b1c8f242/attachment.htm>


More information about the erlang-questions mailing list