[erlang-questions] +swt very_low doesn't seem to avoid schedulers getting
Rickard Green
rickard@REDACTED
Wed Oct 10 21:12:23 CEST 2012
> Hi, all. According to my private mailing list archive, there hasn't
> been much mention of erl's "+swt" flag since about April 2012.
>
> I just witnessed a case of where using "+swt very_low" with Riak
> 1.2.1rc2, using Erlang/OTP R15B01, get "stuck" where CPU consumption was
> only 200% on an 8 core AWS instance. The other nodes in that Riak
> cluster were running at over 600% CPU utilization (on average).
>
> When I ran this:
>
> {io:format("before..."), erlang:system_flag(schedulers_online, 1),
> timer:sleep(1000), erlang:system_flag(schedulers_online, 8),
> io:format("after\n")}.
>
> ... then average CPU utilization on that node immediately shot up from
> 200% to about 760%.
This is very much expected. Since you have work that can load 2
schedulers full time and shuts down all but one, the run-queue will
grow. When you later release all schedulers, there will be lots of work
to pick from.
>
> I'd heard a rumor that "+swt very_low" was supposed to avoid whatever
> weird scheduler problem/bug that caused some schedulers to appear as if
> they weren't active. But I was on site today and witnessed this
> first-hand and verified that the "+swt very_low" flag was indeed being
> used.
>
The runtime system tries to compact the load on as few schedulers as
possible without getting run-queues that build up. The runtime system
wont wake up new schedulers unless some overload has accumulated. This
overload either show up as a quickly growing run-queue or a small
run-queue over a longer time. The +swt flags sets the threshold that is
used for determining when enough overload has accumulated to wake up
another scheduler.
This compaction of load onto fewer schedulers is there in order to
reduce communication overhead when there aren't enough work to fully
utilize all schedulers. The performance gain of this compaction depends
on the hardware.
We have gotten reports about problems with this functionality, but we
have not found any bugs in this functionality. We have only found that
it behaves as expected. That is, if more schedulers aren't woken this is
due to not enough accumulated overload. The +swt switch was introduced
in order to give the user the possibility do define what is enough
overload for his or her taste.
The currently used wakeup strategy is very quick to forget about
previously accumulated overload that has disappeared. Maybe even too
quick for my taste when "+swt very_low" is used. I've therefore
implemented an alternative strategy that most likely will be the default
in R16. As of R15B02 you can try this strategy out by passing "+sws
proposal" as a command line argument. In combination with "+swt
very_low", the runtime system should be even more eager to wake up
schedulers than when only using "+swt very_low".
> I'm not certain what exact Linux distribution and kernel was used. I'll
> ask the customer to send me that info so I can forward it to the list.
>
> Has anyone else seen this behavior? Unlike Knut Nesheim's report on
> this list back in February 2012, Riak does not use the halfword
> emulator. We are using some NIFs, but this customer isn't using the
> most evil one, the Riak eleveldb NIF library. Instead, they're using
> the Bitcask backend (which has a NIF component but isn't as evil as
> eleveldb's NIF) and the merge_index backend (which is pure Erlang).
>
> -Scott
Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB.
More information about the erlang-questions
mailing list