[erlang-questions] Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson jstimpson@REDACTED
Wed Jan 16 16:00:17 CET 2019


It's possible that during our tests the utilization spike was masked by the
collapse issue fixed in the recent PRs. Is there any other analysis I can
provide on the utilization spike/sleep behavior we're seeing, or any other
debugging or code reading you recommend? As far as I can tell, there's
nothing about our workload that would cause periodic behavior like this.
The application is slinging RTP audio via udp to remote endpoints at a 20
msec ptime. Each function call for the NIF in question adds 10 msec of
audio to the WebRTC buffer.

As point of corroboration, this user on stackoverflow appears to be having
the same or a similar issue:
https://stackoverflow.com/questions/49563067/erlang-schedulers-just-sleep-why

As always, the level of support from the Erlang community is second to
none. Thanks to all for your time!

Jesse



On Wed, Jan 16, 2019 at 6:35 AM Rickard Green <rickard@REDACTED> wrote:

> On 2019-01-15 23:11, Jesse Stimpson wrote:
> > Behavior of the schedulers appears to have the same issue with 2093
> patch.
> >
> > But I did notice something new in the msacc output. There is a very
> > brief period, at approx the same time as the normal schedulers usage
> > spikes, where all the dirty cpu schedulers have a significant sleep
> > time. I've included timestamped excerpts below, starting with the
> > increase in dirty cpu sleep, and ending with a "steady state"
> utilization.
> >
>
> We just released OTP-21.2.3 containing PR-2093.
>
> I don't think PR-2093 cause the spikes. This change does not affect how
> work is moved between normal and dirty schedulers, only prevents the
> "loss" of dirty schedulers.
>
> If a process is scheduled on a dirty scheduler it wont make progress
> until it has executed on a dirty scheduler and vice versa (for normal
> schedulers). This is the same both before and after PR-2093. Since dirty
> schedulers aren't "lost" after PR-2093 progress of such processes will
> happen earlier which of course change the behavior, but that is due to
> the work load.
>
> Regards,
> Rickard
>


-- 

<http://www.republicwireless.com/>

Jesse Stimpson

Site Reliability Engineering

m: 9199950424 <(919)%20995-0424>
RepublicWireless.com <https://republicwireless.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190116/ecb857ec/attachment.htm>


More information about the erlang-questions mailing list