[erlang-questions] Schedulers getting "stuck", part II

Rick Reed <>
Mon Apr 29 17:42:49 CEST 2013


We are still seeing stuck schedulers on R16B although now that I've looked at it,
we've been running with +swt medium (vs. +swt low on R15B) -- I had forgotten I'd
made that change when we first started deploying R16B because I wanted to gauge
whether there had been an improvement.

This is what a sample %cpu distribution looks like in this condition:

97.0
96.3
95.6
95.3
93.6
93.2
92.6
92.1
91.5
90.5
90.3
87.9
87.6
87.5
84.2
83.9
83.6
83.0
82.7
82.2
80.8
79.4
77.9
  2.7
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0
  0.0

When I dug into this a while back on R15B, I think I reached the conclusion that at least with
our workload, the run queues on a scheduler were being drained to <2 often enough (even with
very significant overall load) that wakeup_other_check was keeping wakeup_other below the
limit for wakeup.  Even with an overall scheduler utilization over 75%, our run queues are
quite short.

Rr
On 4/26/13 7:20 PM, Scott Lystig Fritchie wrote:
> Howdy.  This is a followup to the discussion that took place on this
> list in October 2012, see:
>
>      http://erlang.org/pipermail/erlang-questions/2012-October/069503.html
>          (first message only, I dunno why)
>      http://erlang.org/pipermail/erlang-questions/2012-October/069585.html
>          (the rest of the thread)
>
> I've been trying to figure out how to introduce the stuff that I've
> written at:
>
>      https://github.com/slfritchie/nifwait/tree/md5#readme
>
> ... but I still can't decide.  So I'll try for something short and
> un-Scott-like.  For the long story, please read the README in the URL
> above.
>
> As for the short story, I believe a couple of things:
>
> * R15B0x's schedulers are broken: Basho seen "stuck" schedulers in one of
>    our apps with no custom NIF code.  And it's possible to get them stuck
>    using only the 'crypto' module's MD5 functions.
>
> * R16B's schedulers appear to be even more broken: I have a
>    mostly-deterministic case that demonstrates schedulers that go to
>    sleep and do not wake for minutes (or hours) when there is plenty of
>    work to do.  This also is using only the 'crypto' module and does not
>    require custom NIF code.
>
> Discuss.  :-)
>
> -Scott
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
>


More information about the erlang-questions mailing list