[erlang-questions] Dirty CPU schedulers stuck at zero utilization
Jesse Stimpson
jstimpson@REDACTED
Tue Jan 8 17:13:22 CET 2019
Hello,
We run a large scale WebRTC-based VOIP application, and currently we are
evaluating whether or not dirty CPU schedulers can help us get more
consistent performance with some critical NIFs. We've created a test that
mimics our production load and have run it on AWS EC2 instances of various
sizes. When dirty CPU schedulers are working as expected, we do see
measurable improvement in performance, but unfortunately we have seen that
they can drop to 0 utilization. They do not recover from this state.
The timings from 2 of our 12 NIFs are in the vicinity of the 1 msec rule of
thumb, so we have experimented with them both on and off dirty CPU
schedulers. We've left the other NIFs on the traditional schedulers. More
detailed NIF timings can be found below.
At the end of this message is the concerning data (from
recon:scheduler_usage/1).We have schedulers 1-36 as the traditional
schedulers bound to the first hardware thread of each core, and schedulers
37-72 are dirty CPU schedulers. Based on mpstat it appears the dirty CPU
schedulers are not bound to threads. After about 10 minutes of running the
load test, schedulers 42, 43, 44, 61, 63, 64, 66, 68, 72 went to at 0
utilization and stayed there indefinitely. As time goes on, more dirty CPU
schedulers enter this 0 utilization state. Schedulers 1-36 do not exhibit
this behaviour. We have reproduced this consistently on other EC2 c5
instances of various sizes.
We have tried various beam adjustments, but we have seen the same behaviour
each time. Some examples of things we have tried:
+sbwt very_long +sbwtdcpu very_long +swt very_low +swtdcpu very_low
+sfwi 500
Is there anything else that we can try to avoid this 0-utilization issue?
We are happy to provide more data if needed.
NIF: webrtc_acm_add_data_10ms
median: 606 usec
90 percentile: 1230 usec
99 percentile: 2285 usec
99.9 percentile: 3223 usec
NIF: webrtc_acm_playout_data_10ms
median: 98 usec
90 percentile: 371 usec
99 percentile: 410 usec
99.9 percentile: 449 usec
EC2 instance: c5.18xlarge
OS: Linux 4.4.0-1073-aws #83-Ubuntu SMP Sat Nov 17 00:26:27 UTC 2018 x86_64
x86_64 x86_64 GNU/Linux
beam flags: +P 4194304 +sbt db +SP 50:50 +sub true +A 128 +K true
Erlang banner: Erlang/OTP 21 [erts-10.2] [source] [64-bit] [smp:36:36]
[ds:36:36:10] [async-threads:128] [hipe]
cpu_topology:
[{node,[{processor,[{core,[{thread,{logical,0}},
{thread,{logical,36}}]},
{core,[{thread,{logical,1}},{thread,{logical,37}}]},
{core,[{thread,{logical,2}},{thread,{logical,38}}]},
{core,[{thread,{logical,3}},{thread,{logical,39}}]},
{core,[{thread,{logical,4}},{thread,{logical,40}}]},
{core,[{thread,{logical,5}},{thread,{logical,41}}]},
{core,[{thread,{logical,6}},{thread,{logical,42}}]},
{core,[{thread,{logical,7}},{thread,{logical,43}}]},
{core,[{thread,{logical,8}},{thread,{logical,44}}]},
{core,[{thread,{logical,9}},{thread,{logical,45}}]},
{core,[{thread,{logical,10}},{thread,{logical,46}}]},
{core,[{thread,{logical,11}},{thread,{logical,47}}]},
{core,[{thread,{logical,12}},{thread,{logical,48}}]},
{core,[{thread,{logical,13}},{thread,{logical,49}}]},
{core,[{thread,{logical,14}},{thread,{logical,50}}]},
{core,[{thread,{logical,15}},{thread,{logical,51}}]},
{core,[{thread,{logical,16}},{thread,{logical,52}}]},
{core,[{thread,{logical,17}},{thread,{logical,53}}]}]}]},
{node,[{processor,[{core,[{thread,{logical,18}},
{thread,{logical,54}}]},
{core,[{thread,{logical,19}},{thread,{logical,55}}]},
{core,[{thread,{logical,20}},{thread,{logical,56}}]},
{core,[{thread,{logical,21}},{thread,{logical,57}}]},
{core,[{thread,{logical,22}},{thread,{logical,58}}]},
{core,[{thread,{logical,23}},{thread,{logical,59}}]},
{core,[{thread,{logical,24}},{thread,{logical,60}}]},
{core,[{thread,{logical,25}},{thread,{logical,61}}]},
{core,[{thread,{logical,26}},{thread,{logical,62}}]},
{core,[{thread,{logical,27}},{thread,{logical,63}}]},
{core,[{thread,{logical,28}},{thread,{logical,64}}]},
{core,[{thread,{logical,29}},{thread,{logical,65}}]},
{core,[{thread,{logical,30}},{thread,{logical,66}}]},
{core,[{thread,{logical,31}},{thread,{logical,67}}]},
{core,[{thread,{logical,32}},{thread,{logical,68}}]},
{core,[{thread,{logical,33}},{thread,{logical,69}}]},
{core,[{thread,{logical,34}},{thread,{logical,70}}]},
{core,[{thread,{logical,35}},{thread,{logical,71}}]}]}]}]
scheduler_bindings:
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29,30,31,32,33,34,35}
CPU utilization (from mpstat) during our load test:
Linux 4.4.0-1073-aws (isw-core-x-qa-awsoh02) 01/04/2019 _x86_64_ (72 CPU)
08:55:15 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest
%gnice %idle
08:55:15 PM all 51.58 0.00 1.64 0.01 0.00 0.42 0.00 0.00 0.00 46.36
08:55:15 PM 0 62.34 0.00 2.69 0.06 0.00 0.00 0.00 0.00 0.00 34.90
08:55:15 PM 1 62.84 0.00 2.58 0.02 0.00 0.00 0.00 0.00 0.00 34.55
08:55:15 PM 2 62.04 0.00 2.52 0.00 0.00 0.00 0.00 0.00 0.00 35.44
08:55:15 PM 3 62.17 0.00 2.62 0.00 0.00 0.00 0.00 0.00 0.00 35.21
08:55:15 PM 4 62.14 0.00 2.54 0.02 0.00 0.00 0.00 0.00 0.00 35.30
08:55:15 PM 5 62.06 0.00 2.51 0.00 0.00 0.00 0.00 0.00 0.00 35.43
08:55:15 PM 6 61.93 0.00 2.58 0.01 0.00 0.00 0.02 0.00 0.00 35.46
08:55:15 PM 7 61.99 0.00 2.54 0.00 0.00 0.00 0.00 0.00 0.00 35.48
08:55:15 PM 8 62.01 0.00 2.52 0.00 0.00 0.00 0.00 0.00 0.00 35.47
08:55:15 PM 9 61.88 0.00 2.49 0.00 0.00 0.00 0.00 0.00 0.00 35.63
08:55:15 PM 10 62.18 0.00 2.48 0.00 0.00 0.00 0.00 0.00 0.00 35.34
08:55:15 PM 11 61.89 0.00 2.47 0.00 0.00 0.00 0.00 0.00 0.00 35.64
08:55:15 PM 12 61.98 0.00 2.50 0.00 0.00 0.00 0.00 0.00 0.00 35.52
08:55:15 PM 13 61.80 0.00 2.54 0.00 0.00 0.00 0.00 0.00 0.00 35.66
08:55:15 PM 14 61.96 0.00 2.56 0.00 0.00 0.00 0.00 0.00 0.00 35.48
08:55:15 PM 15 62.11 0.00 2.57 0.00 0.00 0.00 0.00 0.00 0.00 35.31
08:55:15 PM 16 62.03 0.00 2.55 0.00 0.00 0.00 0.00 0.00 0.00 35.42
08:55:15 PM 17 61.90 0.00 2.53 0.00 0.00 0.00 0.00 0.00 0.00 35.57
08:55:15 PM 18 58.36 0.00 2.41 0.09 0.00 3.72 0.00 0.00 0.00 35.43
08:55:15 PM 19 58.29 0.00 2.45 0.02 0.00 4.00 0.00 0.00 0.00 35.24
08:55:15 PM 20 58.45 0.00 2.36 0.02 0.00 3.73 0.00 0.00 0.00 35.44
08:55:15 PM 21 58.56 0.00 2.34 0.04 0.00 3.54 0.00 0.00 0.00 35.51
08:55:15 PM 22 58.12 0.00 2.46 0.00 0.00 3.84 0.00 0.00 0.00 35.58
08:55:15 PM 23 58.24 0.00 2.36 0.01 0.00 3.82 0.00 0.00 0.00 35.57
08:55:15 PM 24 58.84 0.00 2.42 0.00 0.00 3.60 0.00 0.00 0.00 35.14
08:55:15 PM 25 58.35 0.00 2.43 0.00 0.00 3.72 0.00 0.00 0.00 35.51
08:55:15 PM 26 62.09 0.00 2.47 0.00 0.00 0.00 0.00 0.00 0.00 35.45
08:55:15 PM 27 62.09 0.00 2.46 0.00 0.00 0.00 0.00 0.00 0.00 35.45
08:55:15 PM 28 62.15 0.00 2.48 0.00 0.00 0.00 0.00 0.00 0.00 35.37
08:55:15 PM 29 62.09 0.00 2.46 0.00 0.00 0.00 0.00 0.00 0.00 35.45
08:55:15 PM 30 62.74 0.00 2.45 0.00 0.00 0.00 0.00 0.00 0.00 34.81
08:55:15 PM 31 62.14 0.00 2.47 0.00 0.00 0.00 0.00 0.00 0.00 35.39
08:55:15 PM 32 62.35 0.00 2.52 0.00 0.00 0.00 0.00 0.00 0.00 35.13
08:55:15 PM 33 62.13 0.00 2.42 0.00 0.00 0.00 0.00 0.00 0.00 35.46
08:55:15 PM 34 62.08 0.00 2.47 0.00 0.00 0.00 0.00 0.00 0.00 35.45
08:55:15 PM 35 62.15 0.00 2.49 0.00 0.00 0.00 0.00 0.00 0.00 35.35
08:55:15 PM 36 43.19 0.00 0.80 0.00 0.00 0.00 0.00 0.00 0.00 56.01
08:55:15 PM 37 43.09 0.00 0.83 0.01 0.00 0.00 0.00 0.00 0.00 56.06
08:55:15 PM 38 42.59 0.00 0.78 0.01 0.00 0.00 0.00 0.00 0.00 56.63
08:55:15 PM 39 42.60 0.00 0.82 0.00 0.00 0.00 0.00 0.00 0.00 56.58
08:55:15 PM 40 42.94 0.00 0.76 0.02 0.00 0.00 0.00 0.00 0.00 56.28
08:55:15 PM 41 42.11 0.00 0.76 0.01 0.00 0.00 0.00 0.00 0.00 57.13
08:55:15 PM 42 43.26 0.00 0.85 0.01 0.00 0.00 0.00 0.00 0.00 55.88
08:55:15 PM 43 42.23 0.00 0.77 0.01 0.00 0.00 0.00 0.00 0.00 56.99
08:55:15 PM 44 42.24 0.00 0.80 0.00 0.00 0.00 0.00 0.00 0.00 56.95
08:55:15 PM 45 41.65 0.00 0.79 0.01 0.00 0.00 0.00 0.00 0.00 57.55
08:55:15 PM 46 41.76 0.00 0.77 0.01 0.00 0.00 0.00 0.00 0.00 57.46
08:55:15 PM 47 42.66 0.00 0.82 0.00 0.00 0.00 0.00 0.00 0.00 56.52
08:55:15 PM 48 41.83 0.00 0.76 0.01 0.00 0.00 0.00 0.00 0.00 57.40
08:55:15 PM 49 42.04 0.00 0.82 0.00 0.00 0.00 0.00 0.00 0.00 57.13
08:55:15 PM 50 41.55 0.00 0.78 0.01 0.00 0.00 0.00 0.00 0.00 57.67
08:55:15 PM 51 42.37 0.00 0.78 0.00 0.00 0.00 0.00 0.00 0.00 56.84
08:55:15 PM 52 41.66 0.00 0.77 0.00 0.00 0.00 0.00 0.00 0.00 57.57
08:55:15 PM 53 41.77 0.00 0.79 0.00 0.00 0.00 0.00 0.00 0.00 57.45
08:55:15 PM 54 40.33 0.00 0.88 0.01 0.00 0.00 0.00 0.00 0.00 58.79
08:55:15 PM 55 41.62 0.00 0.78 0.01 0.00 0.00 0.00 0.00 0.00 57.59
08:55:15 PM 56 41.51 0.00 0.82 0.00 0.00 0.00 0.00 0.00 0.00 57.67
08:55:15 PM 57 40.35 0.00 0.76 0.00 0.00 0.00 0.00 0.00 0.00 58.89
08:55:15 PM 58 41.92 0.00 0.79 0.01 0.00 0.00 0.00 0.00 0.00 57.28
08:55:15 PM 59 41.48 0.00 0.80 0.00 0.00 0.00 0.00 0.00 0.00 57.72
08:55:15 PM 60 40.83 0.00 0.75 0.01 0.00 0.00 0.00 0.00 0.00 58.41
08:55:15 PM 61 40.95 0.00 0.78 0.01 0.00 0.00 0.00 0.00 0.00 58.26
08:55:15 PM 62 40.85 0.00 0.80 0.01 0.00 0.00 0.00 0.00 0.00 58.35
08:55:15 PM 63 42.11 0.00 0.77 0.01 0.00 0.00 0.00 0.00 0.00 57.12
08:55:15 PM 64 42.22 0.00 0.78 0.01 0.00 0.00 0.00 0.00 0.00 56.99
08:55:15 PM 65 41.55 0.00 0.78 0.00 0.00 0.00 0.00 0.00 0.00 57.67
08:55:15 PM 66 41.68 0.00 0.81 0.00 0.00 0.00 0.00 0.00 0.00 57.51
08:55:15 PM 67 41.06 0.00 0.87 0.01 0.00 0.00 0.00 0.00 0.00 58.07
08:55:15 PM 68 41.80 0.00 0.79 0.00 0.00 0.00 0.00 0.00 0.00 57.41
08:55:15 PM 69 41.38 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.00 57.87
08:55:15 PM 70 42.17 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.00 57.08
08:55:15 PM 71 41.26 0.00 0.81 0.00 0.00 0.00 0.01 0.00 0.00 57.92
1> rp(recon:scheduler_usage(1000)).
[{1,0.8581586716045433},
{2,0.8607597847736659},
{3,0.8501543449289042},
{4,0.8619718426658023},
{5,0.8660226578705742},
{6,0.8555763933364806},
{7,0.8557195972097537},
{8,0.8668284625654983},
{9,0.858588675506532},
{10,0.8650501369640686},
{11,0.8615557214275474},
{12,0.8523392789683996},
{13,0.8469329348407398},
{14,0.8615637534991641},
{15,0.8591930739986557},
{16,0.8578596164201188},
{17,0.8549696086049403},
{18,0.8677098230685564},
{19,0.8556328340054443},
{20,0.8642542459479063},
{21,0.8670655179064826},
{22,0.8562638308551557},
{23,0.8511202058055239},
{24,0.8697339264798696},
{25,0.8594823462601853},
{26,0.8634550008001265},
{27,0.8568590130172198},
{28,0.864025757952233},
{29,0.8575614770622326},
{30,0.8611760239143281},
{31,0.8589789125669829},
{32,0.8537754468447621},
{33,0.8675538912040871},
{34,0.8607522803519515},
{35,0.8639389099445753},
{36,0.8743759133171144},
{37,0.9526507413773687},
{38,0.948925611917577},
{39,0.9511904612810083},
{40,0.9581015421004746},
{41,0.9534977201467516},
{42,0.0},
{43,0.0},
{44,0.0},
{45,0.9522518499948341},
{46,0.9453306755503674},
{47,0.9586374601140047},
{48,0.9574611845671873},
{49,0.950914304057791},
{50,0.9582124294398642},
{51,0.9448074463900029},
{52,0.9474771999511407},
{53,0.9450778098401229},
{54,0.0},
{55,0.9478167911711937},
{56,0.9612587860764551},
{57,0.943468696358506},
{58,0.9509498642900455},
{59,0.9603595134495414},
{60,0.9517397912369205},
{61,0.0},
{62,0.9587029756970658},
{63,0.0},
{64,0.0},
{65,0.9526167902065421},
{66,0.0},
{67,0.9475668680715805},
{68,0.0},
{69,0.9630301804839606},
{70,0.9475359592879946},
{71,0.9541718747681374},
{72,0.0}]
Thanks for your time,
Jesse Stimpson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190108/58963a26/attachment.htm>
More information about the erlang-questions
mailing list