[erlang-questions] Dirty CPU schedulers stuck at zero utilization

Jesse Stimpson jstimpson@REDACTED
Tue Jan 8 17:13:22 CET 2019


Hello,

We run a large scale WebRTC-based VOIP application, and currently we are
evaluating whether or not dirty CPU schedulers can help us get more
consistent performance with some critical NIFs. We've created a test that
mimics our production load and have run it on AWS EC2 instances of various
sizes. When dirty CPU schedulers are working as expected, we do see
measurable improvement in performance, but unfortunately we have seen that
they can drop to 0 utilization. They do not recover from this state.

The timings from 2 of our 12 NIFs are in the vicinity of the 1 msec rule of
thumb, so we have experimented with them both on and off dirty CPU
schedulers. We've left the other NIFs on the traditional schedulers. More
detailed NIF timings can be found below.

At the end of this message is the concerning data (from
recon:scheduler_usage/1).We have schedulers 1-36 as the traditional
schedulers bound to the first hardware thread of each core, and schedulers
37-72 are dirty CPU schedulers. Based on mpstat it appears the dirty CPU
schedulers are not bound to threads. After about 10 minutes of running the
load test, schedulers 42, 43, 44, 61, 63, 64, 66, 68, 72 went to at 0
utilization and stayed there indefinitely. As time goes on, more dirty CPU
schedulers enter this 0 utilization state. Schedulers 1-36 do not exhibit
this behaviour. We have reproduced this consistently on other EC2 c5
instances of various sizes.

We have tried various beam adjustments, but we have seen the same behaviour
each time. Some examples of things we have tried:
+sbwt very_long +sbwtdcpu very_long +swt very_low +swtdcpu very_low
+sfwi 500

Is there anything else that we can try to avoid this 0-utilization issue?
We are happy to provide more data if needed.

NIF: webrtc_acm_add_data_10ms

median: 606 usec

90 percentile: 1230 usec

99 percentile: 2285 usec

99.9 percentile: 3223 usec


NIF: webrtc_acm_playout_data_10ms

median: 98 usec

90 percentile: 371 usec

99 percentile: 410 usec

99.9 percentile: 449 usec


EC2 instance: c5.18xlarge


OS: Linux 4.4.0-1073-aws #83-Ubuntu SMP Sat Nov 17 00:26:27 UTC 2018 x86_64
x86_64 x86_64 GNU/Linux


beam flags: +P 4194304 +sbt db +SP 50:50 +sub true +A 128 +K true


Erlang banner: Erlang/OTP 21 [erts-10.2] [source] [64-bit] [smp:36:36]
[ds:36:36:10] [async-threads:128] [hipe]


cpu_topology:
[{node,[{processor,[{core,[{thread,{logical,0}},
                           {thread,{logical,36}}]},
                    {core,[{thread,{logical,1}},{thread,{logical,37}}]},
                    {core,[{thread,{logical,2}},{thread,{logical,38}}]},
                    {core,[{thread,{logical,3}},{thread,{logical,39}}]},
                    {core,[{thread,{logical,4}},{thread,{logical,40}}]},
                    {core,[{thread,{logical,5}},{thread,{logical,41}}]},
                    {core,[{thread,{logical,6}},{thread,{logical,42}}]},
                    {core,[{thread,{logical,7}},{thread,{logical,43}}]},
                    {core,[{thread,{logical,8}},{thread,{logical,44}}]},
                    {core,[{thread,{logical,9}},{thread,{logical,45}}]},
                    {core,[{thread,{logical,10}},{thread,{logical,46}}]},
                    {core,[{thread,{logical,11}},{thread,{logical,47}}]},
                    {core,[{thread,{logical,12}},{thread,{logical,48}}]},
                    {core,[{thread,{logical,13}},{thread,{logical,49}}]},
                    {core,[{thread,{logical,14}},{thread,{logical,50}}]},
                    {core,[{thread,{logical,15}},{thread,{logical,51}}]},
                    {core,[{thread,{logical,16}},{thread,{logical,52}}]},

{core,[{thread,{logical,17}},{thread,{logical,53}}]}]}]},
 {node,[{processor,[{core,[{thread,{logical,18}},
                           {thread,{logical,54}}]},
                    {core,[{thread,{logical,19}},{thread,{logical,55}}]},
                    {core,[{thread,{logical,20}},{thread,{logical,56}}]},
                    {core,[{thread,{logical,21}},{thread,{logical,57}}]},
                    {core,[{thread,{logical,22}},{thread,{logical,58}}]},
                    {core,[{thread,{logical,23}},{thread,{logical,59}}]},
                    {core,[{thread,{logical,24}},{thread,{logical,60}}]},
                    {core,[{thread,{logical,25}},{thread,{logical,61}}]},
                    {core,[{thread,{logical,26}},{thread,{logical,62}}]},
                    {core,[{thread,{logical,27}},{thread,{logical,63}}]},
                    {core,[{thread,{logical,28}},{thread,{logical,64}}]},
                    {core,[{thread,{logical,29}},{thread,{logical,65}}]},
                    {core,[{thread,{logical,30}},{thread,{logical,66}}]},
                    {core,[{thread,{logical,31}},{thread,{logical,67}}]},
                    {core,[{thread,{logical,32}},{thread,{logical,68}}]},
                    {core,[{thread,{logical,33}},{thread,{logical,69}}]},
                    {core,[{thread,{logical,34}},{thread,{logical,70}}]},

{core,[{thread,{logical,35}},{thread,{logical,71}}]}]}]}]

scheduler_bindings:
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29,30,31,32,33,34,35}

CPU utilization (from mpstat) during our load test:

Linux 4.4.0-1073-aws (isw-core-x-qa-awsoh02)    01/04/2019 _x86_64_ (72 CPU)

08:55:15 PM  CPU %usr %nice    %sys %iowait %irq %soft  %steal %guest
%gnice %idle

08:55:15 PM  all 51.58  0.00 1.64 0.01    0.00 0.42 0.00 0.00    0.00 46.36

08:55:15 PM    0 62.34 0.00    2.69 0.06 0.00   0.00 0.00 0.00 0.00   34.90

08:55:15 PM    1 62.84 0.00    2.58 0.02 0.00   0.00 0.00 0.00 0.00   34.55

08:55:15 PM    2 62.04 0.00    2.52 0.00 0.00   0.00 0.00 0.00 0.00   35.44

08:55:15 PM    3 62.17 0.00    2.62 0.00 0.00   0.00 0.00 0.00 0.00   35.21

08:55:15 PM    4 62.14 0.00    2.54 0.02 0.00   0.00 0.00 0.00 0.00   35.30

08:55:15 PM    5 62.06 0.00    2.51 0.00 0.00   0.00 0.00 0.00 0.00   35.43

08:55:15 PM    6 61.93 0.00    2.58 0.01 0.00   0.00 0.02 0.00 0.00   35.46

08:55:15 PM    7 61.99 0.00    2.54 0.00 0.00   0.00 0.00 0.00 0.00   35.48

08:55:15 PM    8 62.01 0.00    2.52 0.00 0.00   0.00 0.00 0.00 0.00   35.47

08:55:15 PM    9 61.88 0.00    2.49 0.00 0.00   0.00 0.00 0.00 0.00   35.63

08:55:15 PM   10 62.18 0.00    2.48 0.00 0.00  0.00 0.00 0.00 0.00   35.34

08:55:15 PM   11 61.89 0.00    2.47 0.00 0.00  0.00 0.00 0.00 0.00   35.64

08:55:15 PM   12 61.98 0.00    2.50 0.00 0.00  0.00 0.00 0.00 0.00   35.52

08:55:15 PM   13 61.80 0.00    2.54 0.00 0.00  0.00 0.00 0.00 0.00   35.66

08:55:15 PM   14 61.96 0.00    2.56 0.00 0.00  0.00 0.00 0.00 0.00   35.48

08:55:15 PM   15 62.11 0.00    2.57 0.00 0.00  0.00 0.00 0.00 0.00   35.31

08:55:15 PM   16 62.03 0.00    2.55 0.00 0.00  0.00 0.00 0.00 0.00   35.42

08:55:15 PM   17 61.90 0.00    2.53 0.00 0.00  0.00 0.00 0.00 0.00   35.57

08:55:15 PM   18 58.36 0.00    2.41 0.09 0.00  3.72 0.00 0.00 0.00   35.43

08:55:15 PM   19 58.29 0.00    2.45 0.02 0.00  4.00 0.00 0.00 0.00   35.24

08:55:15 PM   20 58.45 0.00    2.36 0.02 0.00  3.73 0.00 0.00 0.00   35.44

08:55:15 PM   21 58.56 0.00    2.34 0.04 0.00  3.54 0.00 0.00 0.00   35.51

08:55:15 PM   22 58.12 0.00    2.46 0.00 0.00  3.84 0.00 0.00 0.00   35.58

08:55:15 PM   23 58.24 0.00    2.36 0.01 0.00  3.82 0.00 0.00 0.00   35.57

08:55:15 PM   24 58.84 0.00    2.42 0.00 0.00  3.60 0.00 0.00 0.00   35.14

08:55:15 PM   25 58.35 0.00    2.43 0.00 0.00  3.72 0.00 0.00 0.00   35.51

08:55:15 PM   26 62.09 0.00    2.47 0.00 0.00  0.00 0.00 0.00 0.00   35.45

08:55:15 PM   27 62.09 0.00    2.46 0.00 0.00  0.00 0.00 0.00 0.00   35.45

08:55:15 PM   28 62.15 0.00    2.48 0.00 0.00  0.00 0.00 0.00 0.00   35.37

08:55:15 PM   29 62.09 0.00    2.46 0.00 0.00  0.00 0.00 0.00 0.00   35.45

08:55:15 PM   30 62.74 0.00    2.45 0.00 0.00  0.00 0.00 0.00 0.00   34.81

08:55:15 PM   31 62.14 0.00    2.47 0.00 0.00  0.00 0.00 0.00 0.00   35.39

08:55:15 PM   32 62.35 0.00    2.52 0.00 0.00  0.00 0.00 0.00 0.00   35.13

08:55:15 PM   33 62.13 0.00    2.42 0.00 0.00  0.00 0.00 0.00 0.00   35.46

08:55:15 PM   34 62.08 0.00    2.47 0.00 0.00  0.00 0.00 0.00 0.00   35.45

08:55:15 PM   35 62.15 0.00    2.49 0.00 0.00  0.00 0.00 0.00 0.00   35.35

08:55:15 PM   36 43.19 0.00    0.80 0.00 0.00  0.00 0.00 0.00 0.00   56.01

08:55:15 PM   37 43.09 0.00    0.83 0.01 0.00  0.00 0.00 0.00 0.00   56.06

08:55:15 PM   38 42.59 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   56.63

08:55:15 PM   39 42.60 0.00    0.82 0.00 0.00  0.00 0.00 0.00 0.00   56.58

08:55:15 PM   40 42.94 0.00    0.76 0.02 0.00  0.00 0.00 0.00 0.00   56.28

08:55:15 PM   41 42.11 0.00    0.76 0.01 0.00  0.00 0.00 0.00 0.00   57.13

08:55:15 PM   42 43.26 0.00    0.85 0.01 0.00  0.00 0.00 0.00 0.00   55.88

08:55:15 PM   43 42.23 0.00    0.77 0.01 0.00  0.00 0.00 0.00 0.00   56.99

08:55:15 PM   44 42.24 0.00    0.80 0.00 0.00  0.00 0.00 0.00 0.00   56.95

08:55:15 PM   45 41.65 0.00    0.79 0.01 0.00  0.00 0.00 0.00 0.00   57.55

08:55:15 PM   46 41.76 0.00    0.77 0.01 0.00  0.00 0.00 0.00 0.00   57.46

08:55:15 PM   47 42.66 0.00    0.82 0.00 0.00  0.00 0.00 0.00 0.00   56.52

08:55:15 PM   48 41.83 0.00    0.76 0.01 0.00  0.00 0.00 0.00 0.00   57.40

08:55:15 PM   49 42.04 0.00    0.82 0.00 0.00  0.00 0.00 0.00 0.00   57.13

08:55:15 PM   50 41.55 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   57.67

08:55:15 PM   51 42.37 0.00    0.78 0.00 0.00  0.00 0.00 0.00 0.00   56.84

08:55:15 PM   52 41.66 0.00    0.77 0.00 0.00  0.00 0.00 0.00 0.00   57.57

08:55:15 PM   53 41.77 0.00    0.79 0.00 0.00  0.00 0.00 0.00 0.00   57.45

08:55:15 PM   54 40.33 0.00    0.88 0.01 0.00  0.00 0.00 0.00 0.00   58.79

08:55:15 PM   55 41.62 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   57.59

08:55:15 PM   56 41.51 0.00    0.82 0.00 0.00  0.00 0.00 0.00 0.00   57.67

08:55:15 PM   57 40.35 0.00    0.76 0.00 0.00  0.00 0.00 0.00 0.00   58.89

08:55:15 PM   58 41.92 0.00    0.79 0.01 0.00  0.00 0.00 0.00 0.00   57.28

08:55:15 PM   59 41.48 0.00    0.80 0.00 0.00  0.00 0.00 0.00 0.00   57.72

08:55:15 PM   60 40.83 0.00    0.75 0.01 0.00  0.00 0.00 0.00 0.00   58.41

08:55:15 PM   61 40.95 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   58.26

08:55:15 PM   62 40.85 0.00    0.80 0.01 0.00  0.00 0.00 0.00 0.00   58.35

08:55:15 PM   63 42.11 0.00    0.77 0.01 0.00  0.00 0.00 0.00 0.00   57.12

08:55:15 PM   64 42.22 0.00    0.78 0.01 0.00  0.00 0.00 0.00 0.00   56.99

08:55:15 PM   65 41.55 0.00    0.78 0.00 0.00  0.00 0.00 0.00 0.00   57.67

08:55:15 PM   66 41.68 0.00    0.81 0.00 0.00  0.00 0.00 0.00 0.00   57.51

08:55:15 PM   67 41.06 0.00    0.87 0.01 0.00  0.00 0.00 0.00 0.00   58.07

08:55:15 PM   68 41.80 0.00    0.79 0.00 0.00  0.00 0.00 0.00 0.00   57.41

08:55:15 PM   69 41.38 0.00    0.75 0.00 0.00  0.00 0.00 0.00 0.00   57.87

08:55:15 PM   70 42.17 0.00    0.75 0.00 0.00  0.00 0.00 0.00 0.00   57.08

08:55:15 PM   71 41.26 0.00    0.81 0.00 0.00  0.00 0.01 0.00 0.00   57.92


1> rp(recon:scheduler_usage(1000)).

[{1,0.8581586716045433},

{2,0.8607597847736659},

{3,0.8501543449289042},

{4,0.8619718426658023},

{5,0.8660226578705742},

{6,0.8555763933364806},

{7,0.8557195972097537},

{8,0.8668284625654983},

{9,0.858588675506532},

{10,0.8650501369640686},

{11,0.8615557214275474},

{12,0.8523392789683996},

{13,0.8469329348407398},

{14,0.8615637534991641},

{15,0.8591930739986557},

{16,0.8578596164201188},

{17,0.8549696086049403},

{18,0.8677098230685564},

{19,0.8556328340054443},

{20,0.8642542459479063},

{21,0.8670655179064826},

{22,0.8562638308551557},

{23,0.8511202058055239},

{24,0.8697339264798696},

{25,0.8594823462601853},

{26,0.8634550008001265},

{27,0.8568590130172198},

{28,0.864025757952233},

{29,0.8575614770622326},

{30,0.8611760239143281},

{31,0.8589789125669829},

{32,0.8537754468447621},

{33,0.8675538912040871},

{34,0.8607522803519515},

{35,0.8639389099445753},

{36,0.8743759133171144},

{37,0.9526507413773687},

{38,0.948925611917577},

{39,0.9511904612810083},

{40,0.9581015421004746},

{41,0.9534977201467516},

{42,0.0},

{43,0.0},

{44,0.0},

{45,0.9522518499948341},

{46,0.9453306755503674},

{47,0.9586374601140047},

{48,0.9574611845671873},

{49,0.950914304057791},

{50,0.9582124294398642},

{51,0.9448074463900029},

{52,0.9474771999511407},

{53,0.9450778098401229},

{54,0.0},

{55,0.9478167911711937},

{56,0.9612587860764551},

{57,0.943468696358506},

{58,0.9509498642900455},

{59,0.9603595134495414},

{60,0.9517397912369205},

{61,0.0},

{62,0.9587029756970658},

{63,0.0},

{64,0.0},

{65,0.9526167902065421},

{66,0.0},

{67,0.9475668680715805},

{68,0.0},

{69,0.9630301804839606},

{70,0.9475359592879946},

{71,0.9541718747681374},

{72,0.0}]


Thanks for your time,

Jesse Stimpson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190108/58963a26/attachment.htm>


More information about the erlang-questions mailing list