[erlang-questions] How to debug dirty NIF?

Steve Vinoski vinoski@REDACTED
Fri Aug 29 17:55:01 CEST 2014

On Thu, Aug 28, 2014 at 11:19 AM, Max Lapshin <max.lapshin@REDACTED> wrote:

> I'm running CPU bound task (make thumbnails from video) in a NIF under
>  erlang 17 (erts 6.0)
> I'm using dirty nif scheduling:
> static ERL_NIF_TERM
> yuv2jpeg0(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
> ....
> }
> static ERL_NIF_TERM
> async_jpeg(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
>   ERL_NIF_TERM result = yuv2jpeg0(env, argc, argv);
>   return enif_schedule_dirty_nif_finalizer(env, result,
> enif_dirty_nif_finalizer);
> }
> static ERL_NIF_TERM
> yuv2jpeg(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
>   return enif_schedule_dirty_nif(env, ERL_NIF_DIRTY_JOB_CPU_BOUND,
> async_jpeg, argc, argv);
> }
> I see strange situation: none of CPU core is not 100% loaded, but
> processes that are calling yuv2jpeg function are hanging in this function:
> (flussonic@REDACTED)3> process_info(pid(0,961,0)).
> [{current_function,{avcodec,yuv2jpeg0,4}},

There's always the "printf debugger" :) -- have your entry NIF and your
dirty NIF print their respective thread IDs to make sure they're on
different threads, and have each print something just before returning. You
might also consider writing your own finalizer so you can print from there
too. With these in place, you'll at least know where something might be
getting hung up.

A more involved alternative, though still not all that difficult, would be
to build a debuggable runtime and run everything under gdb.

I suppose that I could meet the old problem with wrong scheduler behaviour
> when NIF is using thread for too long.

Scheduler collapse is possible with regular schedulers but not with dirty
schedulers, as the whole point of having the latter is that they're not
bound to the regular scheduler constraints. But if your regular schedulers
are already collapsed, they'll never run and so will never switch your
processes over to a dirty scheduler.

yuv2jpeg takes usually about 2-4 milliseconds to run.
> Is it possible to debug this situation? Can I somehow ask erlang if it
> decided that some scheduler is considered idle?

You might get some useful info from erlang:system_info using arguments such
as schedulers_state, scheduling_statistics, and thread_progress, and maybe
also from erlang:statistics(run_queues). See
http://www.erlang.org/doc/man/erlang.html for details on these.

If there's a way for you to easily package something that I could run for
myself to try to duplicate the problem, just let me know and I'll take a

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140829/e31fc1a3/attachment.htm>

More information about the erlang-questions mailing list