[erlang-questions] Cost of doing +sbwt?

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Wed Sep 2 14:23:57 CEST 2015

If memory serves, R14 can't experience scheduler collapse since it doesn't
do rebalancing of work the same way as R15 and onwards. So I think this is
a red herring.

Have you established a baseline for the locking in R14? You are contending
on the runqueue lock quite a lot, which could account for all the spinning
you are seeing, but it is hard to say if this is a high or low number
without some baseline you can use to relate. Also, many of the futex()
calls are probably for this contention as well. There is a chance your
scheduler utilization isn't that high, but you are getting into the
spinning all the time. If utilization is fairly low, then the 50% CPU isn't
of concern: just load the system more :)

Chances are you are hunting the wrong mark as well: You have 2 or more
pathologies, and they overlap in what you are seeing. Hence you get
distracted by the noise generated by the other problems. It may be you have
a CPU problem and on top of that, you have a latency/blocking problem in an
I/O layer as well. One could account for the latency spikes, whereas the
other would explain the high CPU. But if you don't know you are sitting
with two problems in the first place, then their cooperation in the system
confuses you.

If you have elevated CPU, then a snapshot of the current thread stacks at
97hz per second[1] should tell you where things are taking time. This in
itself could give you hints as to where you are spending all of your time
in the system, and also what you are spinning on, if anything.

[1] Old trick: Never snapshot at 100hz or something which means you can get
into phase with other jobs. Pick some prime around your target.

On Tue, Sep 1, 2015 at 9:42 PM, Lukas Larsson <lukas@REDACTED>

> On Tue, Sep 1, 2015 at 9:14 PM, Paul Davis <paul.joseph.davis@REDACTED>
> wrote:
>> Also, does anyone have a quick pointer to where the busy wait loop is?
>> After I look at the scheduler time I was going to go find that code
>> and see if I couldn't come up with a better idea of what exactly might
>> be changing with that setting.
> This should be the code that does the waiting:
> https://github.com/erlang/otp/blob/master/erts/lib_src/pthread/ethr_event.c#L65-L161
> The mutex implementation that calls it is in here:
> https://github.com/erlang/otp/blob/master/erts/lib_src/common/ethr_mutex.c
> The different spin options are set here:
> https://github.com/erlang/otp/blob/master/erts/emulator/beam/erl_process.c#L5325-L5364
> There are also a couple of other places where it spind in erl_process.c,
> just search for spin and you'll find them :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150902/e3df6877/attachment.htm>

More information about the erlang-questions mailing list