[erlang-questions] dirty scheduler segfault
Tue Nov 4 17:55:35 CET 2014
On Tue, Nov 4, 2014 at 11:01 AM, Steve Vinoski <> wrote:
> On Tue, Nov 4, 2014 at 9:46 AM, Sverker Eriksson <
> > wrote:
>> On 10/31/2014 10:05 PM, Steve Vinoski wrote:
>>> On Fri, Oct 31, 2014 at 4:33 PM, Daniel Goertzen <
>>> I am seeing a segfault that seems to be related to dirty schedulers.
>>>> reduced the fault to the erlang and C nif module below which executes
>>>> same nif with either the io dirty scheduler, the cpu dirty scheduler, or
>>>> the normal erlang scheduler.
>>>> When I start the emulator and run either dirty nif, I get a segfault. (
>>>> see https://gist.github.com/goertzenator/6237e0200a5f7bf22976)
>>>> I found it hard to make sense of what's in that gist due to the
>>> so I took your code and built it myself. When I ran it, it failed in your
>>> NIF load function, but it failed in a way that didn't make sense because
>>> all your function does is return 0. Then I realized none of your C
>>> functions were declared static, which means they are global, and I
>>> suspected your load() function was clashing with some other function of
>>> same name. I made all your C functions static, rebuilt, and then ran
>>> everything and it seems like it worked:
>>> Reading symbols for shared libraries . done
>>> 2> dlibusb:mytest_cpu().
>>> 3> dlibusb:mytest_io().
>>> 4> dlibusb:mytest_none().
>> Run on debug VM and increase 'cnt' in the NIF mytest to something bigger
>> (like 1000) and this will segfault every time.
>> The problem arise when a 0-arity dirty NIF like mytest triggers a GC. The
>> return value from the NIF
>> is not included in the rootset of the GC (as it should be) and the
>> calling erlang code crashes when it later tries to
>> read deallocated garbage.
>> I did the following fix in init_nif_sched_data() which seems to work.
>> ep->fp = indirect_fp;
>> proc->freason = TRAP;
>> + proc->arity = argc;
>> return THE_NON_VALUE;
>> Not sure if that is always the right thing to do.
>> What do you think, Steve?
> Thanks Sverker, glad you were able to reproduce the problem -- I've tried
> and tried but have never gotten it to fail. Increasing the array size also
> makes it reliably crash for me. I'll investigate your proposed fix and will
> probably add a new test for this.
Thanks again Sverker, this is definitely the right fix. I've submitted a PR
And Daniel, thanks for finding and reporting this. Sorry I couldn't
reproduce it sooner.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions