<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Aug 28, 2014 at 11:19 AM, Max Lapshin <span dir="ltr"><<a href="mailto:max.lapshin@gmail.com" target="_blank">max.lapshin@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">I'm running CPU bound task (make thumbnails from video) in a NIF under erlang 17 (erts 6.0)<div>
<br></div><div>I'm using dirty nif scheduling:</div><div><br></div><div><div>static ERL_NIF_TERM</div>
<div>yuv2jpeg0(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {</div></div><div>....</div><div>}</div><div><br></div><div><br></div><div><div>static ERL_NIF_TERM</div><div>async_jpeg(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {</div>
<div> ERL_NIF_TERM result = yuv2jpeg0(env, argc, argv);</div><div> return enif_schedule_dirty_nif_finalizer(env, result, enif_dirty_nif_finalizer);</div><div>}</div><div><br></div><div><br></div><div>static ERL_NIF_TERM</div>
<div>yuv2jpeg(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {</div><div> return enif_schedule_dirty_nif(env, ERL_NIF_DIRTY_JOB_CPU_BOUND, async_jpeg, argc, argv);</div><div>}</div></div><div><br></div><div>I see strange situation: none of CPU core is not 100% loaded, but processes that are calling yuv2jpeg function are hanging in this function:</div>
<div><br></div><div><br></div><div>
<p>(<a href="mailto:flussonic@127.0.0.1" target="_blank">flussonic@127.0.0.1</a>)3> process_info(pid(0,961,0)). </p>
<p>[{current_function,{avcodec,yuv2jpeg0,4}},</p></div></div></blockquote><div><br></div><div>There's always the "printf debugger" :) -- have your entry NIF and your dirty NIF print their respective thread IDs to make sure they're on different threads, and have each print something just before returning. You might also consider writing your own finalizer so you can print from there too. With these in place, you'll at least know where something might be getting hung up.</div>
<div><br></div><div>A more involved alternative, though still not all that difficult, would be to build a debuggable runtime and run everything under gdb.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr"><div>
<p>I suppose that I could meet the old problem with wrong scheduler behaviour when NIF is using thread for too long.<br></p></div></div></blockquote><div><br></div><div>Scheduler collapse is possible with regular schedulers but not with dirty schedulers, as the whole point of having the latter is that they're not bound to the regular scheduler constraints. But if your regular schedulers are already collapsed, they'll never run and so will never switch your processes over to a dirty scheduler.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div><p></p></div>
<div>yuv2jpeg takes usually about 2-4 milliseconds to run.<br></div><div><br></div><div>Is it possible to debug this situation? Can I somehow ask erlang if it decided that some scheduler is considered idle?</div></div></blockquote>
<div><br></div><div><br></div><div>You might get some useful info from erlang:system_info using arguments such as schedulers_state, scheduling_statistics, and thread_progress, and maybe also from erlang:statistics(run_queues). See <a href="http://www.erlang.org/doc/man/erlang.html">http://www.erlang.org/doc/man/erlang.html</a> for details on these.</div>
<div><br></div><div>If there's a way for you to easily package something that I could run for myself to try to duplicate the problem, just let me know and I'll take a look.</div><div><br></div><div>--steve</div></div>
</div></div>