[erlang-questions] +swt very_low doesn't seem to avoid schedulers getting

Tue Oct 16 06:20:51 CEST 2012

Rickard Green <rickard@REDACTED> wrote:

rg> This is very much expected. Since you have work that can load 2
rg> schedulers full time and shuts down all but one, the run-queue will
rg> grow. When you later release all schedulers, there will be lots of
rg> work to pick from.

Hi, Rickard.  Sorry I didn't reply earlier ... Basho kept me busy with
an all-hands meeting and conference out in San Francisco.

Perhaps I wasn't all that clear about the problem that I saw and that
several other customers have witnessed.

1. One node in a Riak cluster is consuming significantly lower CPU than
   the other nodes in the cluster.  The imbalance is not due to
   application layer workload imbalance, as far as we can tell.  (In the
   case that I personally witnessed, it was a lab environment with an
   artificial & deterministic load generator talking to all Riak nodes
   equally (or trying very hard to)).

2. As soon as we do one of two things, the CPU imbalance disappears:
    a. Restart the Riak app on the slow node.
    b. Use the erlang:system_flag(schedulers_online, 1) hack and then
       back to 8 using the same BIF.

In situations described by customers, this seems to happen after a day
or more of load, where the peak workload is substantially higher than
off-peak workload.  In the lab environment that I witnessed, the load
generators were cycling through 100%-off and 100%-on states.

rg> This compaction of load onto fewer schedulers is there in order to
rg> reduce communication overhead when there aren't enough work to fully
rg> utilize all schedulers. The performance gain of this compaction
rg> depends on the hardware.

What you describe seeems to be exactly what's happening ... except that
when input workload rises again, the idled schedulers aren't waking up,
ever.  Or we force them to wake up with the system_flag() BIF.

rg> We have gotten reports about problems with this functionality, but
rg> we have not found any bugs in this functionality. We have only found
rg> that it behaves as expected. That is, if more schedulers aren't
rg> woken this is due to not enough accumulated overload. The +swt
rg> switch was introduced in order to give the user the possibility do
rg> define what is enough overload for his or her taste.

Hrm, well, we've seen it both with "+swt very_low" and without any +swt
flag at all.  And it's extremely irritating.  :-)

What info would you need gathered from the field when this bugaboo
strikes next time?

-Scott