[erlang-bugs] Schedulers getting "stuck", part II

Tue Apr 30 22:34:09 CEST 2013

Patrik Nyblom <pan@REDACTED> wrote:

pn> Hmmm, dropping schedulers...? There seems to be a perfectly new and
pn> fresh bug in R16B when dropping schedulers. One that we've fixed in
pn> the maint branch. Could you please please please try the tip of the
pn> maint branch (i.e. what's to be R16B01)?

Today's maint branch works well, 20 out of 20 runs show all 5 schedulers
in use when I use this (which uses the "nifwait" source repo, see
earlier in this thread for where to find it).

    foreach i (`seq 1 20`)
      ./bin/erl -noshell -noinput +scl false -pz ~/b/src/nifwait/ebin -sname foo -eval 'N = 5, io:format("OS pid ~s\n\n", [os:getpid()]), timer:sleep(8*1000), io:format("go\n"),  erlang:system_flag(schedulers_online, N), timer:sleep(12*1000), timer:tc(erlang, apply, [fun () -> XX = lists:sort(element(1,wait:run(4*100, 1024*1024, 1100, 5))), {hd(XX), lists:last(XX)} end, []]).' & sleep 45 ; kill %.
    end

... and then look at the %user CPU time with "iostat" or "vmstat".

This 20 out of 20 iterations @ 5 cores never happens with R16B.  In
fact, using the same loop above, using R16B, I found 0 out of 6
iterations @ 5 cores before I gave up waiting for a good 5 core
balance.

-Scott