[erlang-questions] dirty scheduler segfault

Daniel Goertzen daniel.goertzen@REDACTED
Mon Nov 3 18:56:29 CET 2014


This is how my erlang was built:

git clone -b OTP-17.3.3 https://github.com/erlang/otp.git otp
cd otp && ./otp_build autoconf && ./configure --prefix=/usr/local/
--without-javac --disable-hipe --enable-dynamic-ssl-lib
--enable-dirty-schedulers && make && make install

OS is a Gentoo chroot, gcc is 4.7.3.  The nif build commands (via
erlang.mk) are...


cc  -O3 -std=c99 -finline-functions -Wall -Wmissing-prototypes
-I/usr/local/lib/erlang/erts-6.2/include/ -fPIC -c -o
/dlibusb/c_src/dlibusb.o /dlibusb/c_src/dlibusb.c
cc   -shared -o /dlibusb/priv/dlibusb.so /dlibusb/c_src/dlibusb.o


My minimal test app is here: https://github.com/goertzenator/dlibusb


I use a Gentoo linux chroot to build an OS for a network appliance.  The OS
is built for a i486, although the variant that will have Erlang has a
beefier i586 CPU.  I fooled around changing things external to Erlang but
could not get the segfault to go away.  This morning I built a minimal
vanilla Gentoo chroot and was able to repeat the segfault on my Kubuntu
laptop and my Kubuntu desktop VM.  The instruction to build the chroot with
Erlang and to run the segfault test are here:

https://gist.github.com/goertzenator/c0b19ee84d16f0e82681

I'm also providing my Gentoo chroot with prebuilt git and Erlang to save a
bit of time and hassle, and to provide binaries of my failing system. (791
MB)

https://drive.google.com/open?id=0B6luM5L22h1oT2QzMWR0Zk9yOWc&authuser=0




On Fri, Oct 31, 2014 at 10:38 PM, Steve Vinoski <vinoski@REDACTED> wrote:

> Given the new calls you're using in your NIF, it looks like you're running
> at least 17.3. I've tried this on Mavericks and Ubuntu 14.04, with and
> without debug, with and without valgrind, and it always works fine for me.
> Valgrind reports no problems. I've tried it with 17.3, and I've tried it
> with a brand new build of maint, and they both work fine.
>
> More comments below.
>
> On Fri, Oct 31, 2014 at 9:57 PM, Daniel Goertzen <
> daniel.goertzen@REDACTED> wrote:
>
>> Thanks for trying it out.  That gist was a bit of a hash; apologies.
>>
>> I made all the functions static and also put load and unload as NULL in
>> ERL_NIF_INIT, but I get the same results.
>>
>> I ran it under valgrind and got...
>>
>>
>> # ERL_LIBS=.. valgrind --trace-children=yes erl
>>
>> ...
>>
>> Eshell V6.2  (abort with ^G)
>>
>> 1>
>>
>> 1> dlibusb:mytest_io().
>>
>> ==9029== Thread 18:
>>
>> ==9029== Invalid read of size 4
>>
>> ==9029==    at 0x8190B56: process_main (beam_hot.h:935)
>>
>> ==9029==    by 0x80E565E: sched_thread_func (erl_process.c:7719)
>>
>> ==9029==    by 0x820982B: thr_wrapper (ethread.c:106)
>>
>> ==9029==    by 0x40FFF46: start_thread (in /lib/libpthread-2.20.so)
>>
>> ==9029==    by 0x41FE97D: clone (in /lib/libc-2.20.so)
>>
>> ==9029==  Address 0xfffffffe is not stack'd, malloc'd or (recently) free'd
>>
>
> There's no dirty scheduler involved in this traceback. The fact that
> you're hitting an invalid read like this so early, and that it looks like
> something is dereferencing something close to a null pointer, smells to me
> like some sort of build problem somewhere. But it's hard to say where
> exactly.
>
> I know little about beam internals. I don't know if this is useful.
>>
>
> Unfortunately not. Perhaps you can provide some more details regarding
> exact Erlang version, OS type and version, how your Erlang was built, and
> what compiler & version and also the command line you're using to build
> your nif?
>
> --steve
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141103/6291e22e/attachment.htm>


More information about the erlang-questions mailing list