[erlang-questions] How to pin-point High CPU utilization in Erlang VM

Marcial Rosales mrosales@REDACTED
Tue Jul 17 12:50:13 CEST 2018


We are experiencing a very high cpu utilization in 3 clustered Erlang VMs
running RabbitMQ. We have deployed another cluster in an attempt to
reproduce the same behaviour without much success.

Our goals are:

   - Find out where the CPU is being utilized
   - Choose the right tools to analyze CPU utilization


Our observations so far:

   - The *BAD* cluster observes a pretty excessive CPU utilization, both
   user and system ones, and also network.
   - The *BAD* cluster also observes a higher Erlang scheduler utilization,
   specially on microstate emulator and other. We are yet to understand what
    other could be. According to Erlang documentation is *unaccounted
   things*.
   - The *BAD* cluster observes a considerably higher number of system
   calls which we are yet to identify (dunno how) why is that.
   - The *BAD* cluster does not necessarily run higher number of
   reductions. In fact, the *GOOD* cluster runs more reductions and yet has
   a lower scheduler utilization.

METRICBADGOOD
user cpu
<https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#dstat>
46%
- 57% 19% - 40%
system cpu
<https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#dstat>
20%
- 37% 1% - 10%
network traffic
<https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#dstat>
6M
- 19M up to 8M
system interrupts
<https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#dstat>
120k
- 196k 10k - 20k
syscalls
<https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#syscalls>
1.6M
- 2.1M 49k - 110k
task-clock 10sec
<https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#perf-stat>
68255 12324
cpu profiling info
<https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841#perf_record_cpu_cycles>

We have gathered lots of metrics in attempt to identify why the BAD cluster
uses so much CPU. All the information can be found here
https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841
along with the environment information.


We appreciate a lot any insights as to what could be causing the issue
and/or in relation to additional tools we could use.
Many thanks

-- 
Marcial Rosales
Pivotal, Inc.  EMEA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180717/b7202e1c/attachment.htm>


More information about the erlang-questions mailing list