[erlang-questions] Schedulers getting "stuck", part II
Rick Reed
rr@REDACTED
Mon Apr 29 17:42:49 CEST 2013
We are still seeing stuck schedulers on R16B although now that I've looked at it,
we've been running with +swt medium (vs. +swt low on R15B) -- I had forgotten I'd
made that change when we first started deploying R16B because I wanted to gauge
whether there had been an improvement.
This is what a sample %cpu distribution looks like in this condition:
97.0
96.3
95.6
95.3
93.6
93.2
92.6
92.1
91.5
90.5
90.3
87.9
87.6
87.5
84.2
83.9
83.6
83.0
82.7
82.2
80.8
79.4
77.9
2.7
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
When I dug into this a while back on R15B, I think I reached the conclusion that at least with
our workload, the run queues on a scheduler were being drained to <2 often enough (even with
very significant overall load) that wakeup_other_check was keeping wakeup_other below the
limit for wakeup. Even with an overall scheduler utilization over 75%, our run queues are
quite short.
Rr
On 4/26/13 7:20 PM, Scott Lystig Fritchie wrote:
> Howdy. This is a followup to the discussion that took place on this
> list in October 2012, see:
>
> http://erlang.org/pipermail/erlang-questions/2012-October/069503.html
> (first message only, I dunno why)
> http://erlang.org/pipermail/erlang-questions/2012-October/069585.html
> (the rest of the thread)
>
> I've been trying to figure out how to introduce the stuff that I've
> written at:
>
> https://github.com/slfritchie/nifwait/tree/md5#readme
>
> ... but I still can't decide. So I'll try for something short and
> un-Scott-like. For the long story, please read the README in the URL
> above.
>
> As for the short story, I believe a couple of things:
>
> * R15B0x's schedulers are broken: Basho seen "stuck" schedulers in one of
> our apps with no custom NIF code. And it's possible to get them stuck
> using only the 'crypto' module's MD5 functions.
>
> * R16B's schedulers appear to be even more broken: I have a
> mostly-deterministic case that demonstrates schedulers that go to
> sleep and do not wake for minutes (or hours) when there is plenty of
> work to do. This also is using only the 'crypto' module and does not
> require custom NIF code.
>
> Discuss. :-)
>
> -Scott
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
More information about the erlang-questions
mailing list