[erlang-questions] Dirty CPU schedulers stuck at zero utilization

Frank Muller frank.muller.erl@REDACTED
Thu Jan 17 07:03:22 CET 2019


Hey Jesse

Glad to hear your Dirty schedulers collapse issue was solved by Rickard’s
PR.

The 21.2.3 release fixed a similar problem we had with Dirty Schedulers.

Jesse: can’t you rewrite your Dirty NIF as a Yielding one? There’s a nice
slide deck from “Andrew Bennett” describing this technique (page 58):
https://cdn.rawgit.com/potatosalad/elixirconf2017/master/presentation.pdf

Best
/Frank

It's possible that during our tests the utilization spike was masked by the
> collapse issue fixed in the recent PRs. Is there any other analysis I can
> provide on the utilization spike/sleep behavior we're seeing, or any other
> debugging or code reading you recommend? As far as I can tell, there's
> nothing about our workload that would cause periodic behavior like this.
> The application is slinging RTP audio via udp to remote endpoints at a 20
> msec ptime. Each function call for the NIF in question adds 10 msec of
> audio to the WebRTC buffer.
>
> As point of corroboration, this user on stackoverflow appears to be having
> the same or a similar issue:
> https://stackoverflow.com/questions/49563067/erlang-schedulers-just-sleep-why
>
> As always, the level of support from the Erlang community is second to
> none. Thanks to all for your time!
>
> Jesse
>
>
>
> On Wed, Jan 16, 2019 at 6:35 AM Rickard Green <rickard@REDACTED> wrote:
>
>> On 2019-01-15 23:11, Jesse Stimpson wrote:
>> > Behavior of the schedulers appears to have the same issue with 2093
>> patch.
>> >
>> > But I did notice something new in the msacc output. There is a very
>> > brief period, at approx the same time as the normal schedulers usage
>> > spikes, where all the dirty cpu schedulers have a significant sleep
>> > time. I've included timestamped excerpts below, starting with the
>> > increase in dirty cpu sleep, and ending with a "steady state"
>> utilization.
>> >
>>
>> We just released OTP-21.2.3 containing PR-2093.
>>
>> I don't think PR-2093 cause the spikes. This change does not affect how
>> work is moved between normal and dirty schedulers, only prevents the
>> "loss" of dirty schedulers.
>>
>> If a process is scheduled on a dirty scheduler it wont make progress
>> until it has executed on a dirty scheduler and vice versa (for normal
>> schedulers). This is the same both before and after PR-2093. Since dirty
>> schedulers aren't "lost" after PR-2093 progress of such processes will
>> happen earlier which of course change the behavior, but that is due to
>> the work load.
>>
>> Regards,
>> Rickard
>>
>
>
> --
>
> <http://www.republicwireless.com/>
>
> Jesse Stimpson
>
> Site Reliability Engineering
>
> m: 9199950424 <(919)%20995-0424>
> RepublicWireless.com <https://republicwireless.com/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190117/b3094263/attachment.htm>


More information about the erlang-questions mailing list