[erlang-questions] dirty scheduler segfault

Sverker Eriksson sverker.eriksson@REDACTED
Tue Nov 4 15:46:34 CET 2014


On 10/31/2014 10:05 PM, Steve Vinoski wrote:
> On Fri, Oct 31, 2014 at 4:33 PM, Daniel Goertzen <daniel.goertzen@REDACTED>
> wrote:
>
>> I am seeing a segfault that seems to be related to dirty schedulers.  I've
>> reduced the fault to the erlang and C nif module below which executes the
>> same nif with either the io dirty scheduler, the cpu dirty scheduler, or
>> the normal erlang scheduler.
>>
>>
>> When I start the emulator and run either dirty nif, I get a segfault. (
>> see https://gist.github.com/goertzenator/6237e0200a5f7bf22976)
>>
> I found it hard to make sense of what's in that gist due to the formatting,
> so I took your code and built it myself. When I ran it, it failed in your
> NIF load function, but it failed in a way that didn't make sense because
> all your function does is return 0. Then I realized none of your C
> functions were declared static, which means they are global, and I
> suspected your load() function was clashing with some other function of the
> same name. I made all your C functions static, rebuilt, and then ran
> everything and it seems like it worked:
>
>> c(dlibusb).
> Reading symbols for shared libraries . done
> {ok,dlibusb}
> 2> dlibusb:mytest_cpu().
> [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
> 3> dlibusb:mytest_io().
> [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
> 4> dlibusb:mytest_none().
> [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
>
> --steve
>
>

Run on debug VM and increase 'cnt' in the NIF mytest to something bigger 
(like 1000) and this will segfault every time.

The problem arise when a 0-arity dirty NIF like mytest triggers a GC. 
The return value from the NIF
is not included in the rootset of the GC (as it should be) and the 
calling erlang code crashes when it later tries to
read deallocated garbage.

I did the following fix in init_nif_sched_data() which seems to work.

     ep->fp = indirect_fp;
      proc->freason = TRAP;
+    proc->arity = argc;
      return THE_NON_VALUE;
  }


Not sure if that is always the right thing to do.
What do you think, Steve?


/Sverker, Erlang/OTP






More information about the erlang-questions mailing list