[erlang-questions] NIF appropriateness, was: Re: Messing with heart. Port and NIF, which one is better?
Thu Feb 14 20:39:02 CET 2013
On 02/14/2013 03:52 AM, Rickard Green wrote:
> On Feb 14, 2013, at 11:52 AM, Michael Truog <> wrote:
>> On 02/14/2013 02:30 AM, Scott Lystig Fritchie wrote:
>>> I'm starting a new'ish thread to mention a bit of experience that Basho
>>> has had with NIFs in Riak.
>>> Garrett Smith <> wrote:
>>>>> And the second question, Is there any good argument to use NIF
>>>>> instead of creating a connected process for a port.
>>> gs> The NIF interface is appropriate for defining simple functions in C.
>>> gs> There are lots of 3rd party libraries where NIFs are used to plugin
>>> gs> in long running, multi-threaded facilities, but this seems misguided
>>> gs> to me.
>>> "Simple functions in C" is a tricky matter ... and it has gotten tricker
>>> with the Erlang/OTP releases R15 and R16.
>>> In R14 and earlier, it wasn't necessarily a horrible thing if you had C
>>> code (or C++ or Fortran or ...) that executed in NIF context for half a
>>> second or more. If your NIF was executing for that long, you knew that
>>> you were interfering with the Erlang scheduler Pthread that was
>>> executing your NIF's C/C++/Fortran/whatever code. That can cause some
>>> weird delays in executing other Erlang processes, but for some apps,
>>> that's OK.
>>> However, with R15, the internal guts of the Erlang process scheduler
>>> Pthreads has changed. Now, if you have a NIF that executes for even a
>>> few milliseconds, the scheduler algorithm can get confused. Instead of
>>> blocking an Erlang scheduler Pthread, you both block that Pthread *and*
>>> you might cause some other scheduler Pthreads to decide incorrectly to
>>> go to sleep (because there aren't enough runnable Erlang processes to
>>> bother trying to schedule). Your 8/16/24 CPU core box can find itself
>>> down to only 3 or 2 active Erlang scheduler Pthreads when there really
>>> is more than 2-3 cores of work waiting.
>>> So, suddenly your "simple functions in C" are now "simple functions in C
>>> that must finish execution in about 1 millisecond or less". If your C
>>> code might take longer than that, then you must use some kind of thread
>>> pool to transfer the long-running work away from the Erlang scheduler
>>> Pthread. Not simple at all, alas.
> Native code (drivers and NIFs) have always been expected to execute for very short periods of time. The major difference is that it is more clearly documented today.
> The number of problems you might run into if you run native code that do not behave well has increased though. This is, however, not new to R15. This has been the case since R11, due to optimizations of the smp runtime system. One such optimization was multiple run-queues that was introduced in R13. Regarding scheduling, R12 to R13 is where the major difference is.
> In some cases we could try to fix problems caused by native code that do not behave well. This would however very often cause a performance penalty that always have to be paid, and it would also prevent us from implementing a lot of optimizations. In my opinion this would just be plain wrong. The VM has never been intended for scheduling of arbitrary native code. A NIF or a driver is supposed to be aware of the VM and help it, not break it.
>>> erlang-questions mailing list
>> These problems are what NIF native processes will solve, right?
> No, but "dirty schedulers" were supposed to ease implementation of things like this. Note that already today you got all the primitives you need, as for example threads for NIFs and drivers.
>> The only other alternative would be to use the async thread pool within a port driver, which may not help the schedulers and is obsoleted by native processes (not to mention the job queue per thread situation which can block on long jobs).
> Rickard Green, Erlang/OTP, Ericsson AB
> erlang-questions mailing list
I understand we have the thread primitives already for NIFs and drivers. It just bothers me that when you create your own thread pool, you put the burden on the Operating System kernel scheduler, causing CPU contention. It seems like the Erlang VM would have more insight as to how to schedule the NIFs, even if they are misbehaving, as long as reductions are bumped properly, or execution time is used as a way of extrapolating a reduction count that makes sense to the VM. One way of simplifying this, might be to have a yield function which is called frequently for blocking NIF functions, such that the yield function handles reporting a reduction count which impacts scheduling. You know more about the problems than I do, I am just voicing concern with having to depend on the kernel scheduler.
More information about the erlang-questions