[erlang-questions] Identifying causes for high CPU usage
Jesper Louis Andersen
Thu Jun 23 14:33:02 CEST 2016
On Thu, Jun 23, 2016 at 12:35 PM, Eli Iser <eli.iser@REDACTED> wrote:
> * run_queue - affected nodes had a run queue of several dozens (less than
> 100), while un-affected nodes had 0 (always). Since run_queues is
> undocumented (at least I didn't see it in the documentation), I didn't run
> it at the time of the problem.
A queues load can be seen as a (real) number K.
If K < 1 it means your system can dequeue messages faster then they arrive.
This leads to a queue size of 0 over time.
If K = 1 it means that your system dequeues at the same rate as the arrival
rate. This leads to a standing queue.
If K > 1 the queue will slowly fill up because the arrival rate is larger
than the processing/dequeue-rate.
In your case, you are either in the K = 1 or the K > 1 situation for a
while. This usually leads to more load on the system because there is more
work to do. Note, however, that a 100% CPU load isn't necessarily a
problem, unless response latencies are also affected. If you start a
periodic background job which is CPU bound, this will take up all the free
resources, but it will hopefully be scheduled out of the core whenever
other work arrives to make way for faster processing.
In other words, you may want to figure out what happens inside the
processes with the larger message queues, and what events could lead to the
longer message queues. A common case is that there is a specific user or
subsystem which invokes the situation through normal use. But the use hits
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions