<div dir="ltr"><div><span style="color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">We are experiencing a very high cpu utilization in 3 clustered Erlang VMs running RabbitMQ. We have deployed another cluster in an attempt to reproduce the same behaviour without much success.</span><br></div><div><span style="color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;text-decoration-style:initial;text-decoration-color:initial">Our goals are:</p><ul style="box-sizing:border-box;padding-left:2em;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;text-decoration-style:initial;text-decoration-color:initial"><li style="box-sizing:border-box">Find out where the CPU is being utilized</li><li style="box-sizing:border-box;margin-top:0.25em">Choose the right tools to analyze CPU utilization</li></ul><br class="gmail-Apple-interchange-newline"><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;text-decoration-style:initial;text-decoration-color:initial">Our observations so far:</p><ul style="box-sizing:border-box;padding-left:2em;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;text-decoration-style:initial;text-decoration-color:initial"><li style="box-sizing:border-box">The<span> </span><strong style="box-sizing:border-box;font-weight:600">BAD</strong><span> </span>cluster observes a pretty excessive CPU utilization, both user and system ones, and also network.</li><li style="box-sizing:border-box;margin-top:0.25em">The<span> </span><strong style="box-sizing:border-box;font-weight:600">BAD</strong><span> </span>cluster also observes a higher Erlang scheduler utilization, specially on microstate<span> </span><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;padding:0.2em 0.4em;margin:0px;background-color:rgba(27,31,35,0.05);border-radius:3px">emulator</code><span> </span>and<span> </span><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;padding:0.2em 0.4em;margin:0px;background-color:rgba(27,31,35,0.05);border-radius:3px">other</code>. We are yet to understand what<span> </span><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;padding:0.2em 0.4em;margin:0px;background-color:rgba(27,31,35,0.05);border-radius:3px">other</code><span> </span>could be. According to Erlang documentation is<span> </span><em style="box-sizing:border-box">unaccounted things</em>.</li><li style="box-sizing:border-box;margin-top:0.25em">The<span> </span><strong style="box-sizing:border-box;font-weight:600">BAD</strong><span> </span>cluster observes a considerably higher number of system calls which we are yet to identify (dunno how) why is that.</li><li style="box-sizing:border-box;margin-top:0.25em">The<span> </span><strong style="box-sizing:border-box;font-weight:600">BAD</strong><span> </span>cluster does not necessarily run higher number of reductions. In fact, the<span> </span><strong style="box-sizing:border-box;font-weight:600">GOOD</strong><span> </span>cluster runs more reductions and yet has a lower scheduler utilization.</li></ul><div><font color="#24292e" face="-apple-system, system-ui, Segoe UI, Helvetica, Arial, sans-serif, Apple Color Emoji, Segoe UI Emoji, Segoe UI Symbol"><span style="font-size:16px"><table style="box-sizing:border-box;border-collapse:collapse;margin-top:0px;margin-bottom:16px;display:block;width:888px;overflow:auto;text-decoration-style:initial;text-decoration-color:initial"><thead style="box-sizing:border-box"><tr style="box-sizing:border-box;background-color:rgb(255,255,255);border-top:1px solid rgb(198,203,209)"><th style="box-sizing:border-box;padding:6px 13px;font-weight:600;border:1px solid rgb(223,226,229)">METRIC</th><th style="box-sizing:border-box;padding:6px 13px;font-weight:600;border:1px solid rgb(223,226,229)">BAD</th><th style="box-sizing:border-box;padding:6px 13px;font-weight:600;border:1px solid rgb(223,226,229)">GOOD</th></tr></thead><tbody style="box-sizing:border-box"><tr style="box-sizing:border-box;background-color:rgb(255,255,255);border-top:1px solid rgb(198,203,209)"><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)"><a href="https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#dstat" style="box-sizing:border-box;background-color:transparent;color:rgb(3,102,214);text-decoration:none">user cpu</a></td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">46% - 57%</td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">19% - 40%</td></tr><tr style="box-sizing:border-box;background-color:rgb(246,248,250);border-top:1px solid rgb(198,203,209)"><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)"><a href="https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#dstat" style="box-sizing:border-box;background-color:transparent;color:rgb(3,102,214);text-decoration:none">system cpu</a></td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">20% - 37%</td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">1% - 10%</td></tr><tr style="box-sizing:border-box;background-color:rgb(255,255,255);border-top:1px solid rgb(198,203,209)"><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)"><a href="https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#dstat" style="box-sizing:border-box;background-color:transparent;color:rgb(3,102,214);text-decoration:none">network traffic</a></td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">6M - 19M</td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">up to 8M</td></tr><tr style="box-sizing:border-box;background-color:rgb(246,248,250);border-top:1px solid rgb(198,203,209)"><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)"><a href="https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#dstat" style="box-sizing:border-box;background-color:transparent;color:rgb(3,102,214);text-decoration:none">system interrupts</a></td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">120k - 196k</td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">10k - 20k</td></tr><tr style="box-sizing:border-box;background-color:rgb(255,255,255);border-top:1px solid rgb(198,203,209)"><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)"><a href="https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#syscalls" style="box-sizing:border-box;background-color:transparent;color:rgb(3,102,214);text-decoration:none">syscalls</a></td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">1.6M - 2.1M</td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">49k - 110k</td></tr><tr style="box-sizing:border-box;background-color:rgb(246,248,250);border-top:1px solid rgb(198,203,209)"><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)"><a href="https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#perf-stat" style="box-sizing:border-box;background-color:transparent;color:rgb(3,102,214);text-decoration:none">task-clock 10sec</a></td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">68255</td><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)">12324</td></tr><tr style="box-sizing:border-box;background-color:rgb(255,255,255);border-top:1px solid rgb(198,203,209)"><td style="box-sizing:border-box;padding:6px 13px;border:1px solid rgb(223,226,229)"><a href="https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#perf_record_cpu_cycles" style="box-sizing:border-box;background-color:transparent;color:rgb(3,102,214);text-decoration:none">cpu profiling info</a></td></tr></tbody></table></span></font></div><div><br></div></div><div><p style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We have gathered lots of metrics in attempt to identify why the BAD cluster uses so much CPU. All the information can be found here <a href="https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841">https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841</a>  along with the environment information.</p><p style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px"><br></p><p style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.</p></div><div><span style="color:rgb(36,41,46);font-family:-apple-system,system-ui,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Many thanks</span></div><div> </div><div>-- <br><div dir="ltr" class="gmail-m_6433092018425678905gmail_signature"><div dir="ltr"><div dir="ltr"><span style="color:rgb(80,0,80);font-size:12.8px">Marcial Rosales</span><div style="color:rgb(80,0,80);font-size:12.8px"><span style="font-size:12.8px">Pivotal, Inc.  EMEA</span><br></div></div><div dir="ltr"><div><br></div></div></div></div></div></div>