[erlang-questions] +swt very_low doesn't seem to avoid schedulers getting
Rickard Green
rickard@REDACTED
Sun Nov 4 02:05:32 CET 2012
This does, however, not show that anything is wrong. The statistics only show that a couple of hundred processes were selected for execution on the same scheduler during some timeframe. If there are no buildup in run-queue length, everything is as it should.
Regards,
Rickard
On Nov 3, 2012, at 11:36 PM, Scott Lystig Fritchie wrote:
> A few weeks ago, Rickard Green <rickard@REDACTED> wrote:
>
>> However, you can call "statistics(run_queues)" (note that the argument
>> should be 'run_queues', and not 'run_queue') repeatedly, say once every
>> 100 ms (perhaps even more frequent than this) for say 10 seconds when
>> the system is in this state . That information will at least give us a
>> good hunch of what is going on. statistics(run_queues) returns a tuple
>> containing the run queue length of each run queue as elements.
>
> Hiya. We have some new data from three customer machines running
> Riak 1.2.1 with R15B01 that all hit what appears to be this same
> "schedulers getting stuck" problem.
>
> The machines were fixed before I was aware of them, so I didn't
> get a chance to rummage around. We do not have the output that
> you suggested, statistics(run_queues). However, we do have
> samples of the traces that are generated by:
>
> erlang:trace(all, true, [running,scheduler_id])
>
> When stuck, the output looks like this, with tuples of
> {scheduler #, # of samples}
>
> (riak@REDACTED)1> schedstat:run().
> <0.16760.459>
> === in scheduler count===
> [{1,264},
> {2,257},
> {3,0},
> {4,0},
> ... and repeating zero samples all the way to
> to scheduler 64.
>
> When unstuck, the output looks like this:
>
> (riak@REDACTED)1> schedstat:run().
> <0.3422.460>
> === in scheduler count===
> [{1,65},
> {2,5},
> {3,0},
> {4,0},
> {5,14},
> {6,73},
> {7,0},
> {8,0},
> {9,0},
> {10,0},
> {11,0},
> {12,0},
> {13,159},
> {14,182},
> {15,6},
> {16,0},
> ... and repeating zero samples all the way to
> to scheduler 64.
>
> I do not know if the +swt flag was used on these machines,
> sorry.
>
> Raw output, courtesy of Kelly McLaughlin, is available at
> https://gist.github.com/4009035. The generating script is
> by Jon Meredith at https://gist.github.com/a460a9dbb11698cf01a6.
>
> The make-it-unstuck method is this:
>
> %% Get current number of online schedulers
> Schedulers = erlang:system_info(schedulers_online).
>
> %% Reduce number online to 1
> erlang:system_flag(schedulers_online, 1).
>
> %% Restore to original number of online schedulers
> erlang:system_flag(schedulers_online, Schedulers).
>
> It isn't clear yet if the next release of Riak will use R15B02
> or remain with R15B01. We were bitten by performance regressions
> (not caught during our pre-release testing) when releasing the
> packages that moved from R14B04 to R15B01. There's the devil
> we're getting to know versus the devil that takes a heck of a lot
> more time to get to know....
>
> -Scott
More information about the erlang-questions
mailing list