[erlang-questions] High latency when exchanging small messages between different Erlang nodes
Fri Apr 12 15:34:28 CEST 2019
Yea, instrumentation from the beginning is a good bet. Shameless plug https://opencensus.io/quickstart/erlang/ :) -- and prometheus.erl for vm metrics like Jesper suggests.
On Fri, Apr 12, 2019, at 03:53, Jesper Louis Andersen wrote:
> My first recommendation is to add instrumentation to the system, so you can see what is going on:
> * Tristan already suggested looking at mailbox sizes
> * Network blocking is worth investigating as well. Many small messages can lead to network overload situations
> * Docker/Kubernetes environments tend to be noisy if a lot of work is running in them. In particular, if you have high-throughput systems banded with low latency systems, you are going to run into trouble.
> * Enable the Erlang system monitor. Get it to report on blocked ports and processes.
> * Add VM metrics: prometheus for instance.
> The problem can be everywhere: Inside your code, the VM, docker, kernel, hardware, ... Your first goal is to narrow down that. Verify things are looking correct in each layer before moving to the next.
> The fact latency starts out at 1 second where we are at millisecond level locally, would suggest something has to do with the distribution. Either in your own code, or in the underlying setup.
> On Thu, Apr 11, 2019 at 9:07 PM Konstantinos Kallas <konstantinos.kallas@REDACTED> wrote:
>> I have an Erlang application where latency is crucial and a lot of small messages (tuples with an atom and integer) are exchanged between processes in different nodes.
>> The main procedure is that a main process sends a small message to 4 worker processes in other Erlang nodes, the worker processes do some negligible processing, and then they reply back to the main node with a small message.
>> Each separate Erlang node is on a different docker container (generated from the erlang:21 docker image), and all the containers are connected using a standard docker bridge network.
>> I have noticed that latency (the time from when the first message is sent, and its replies arrive) linearly increases with time. It starts at 1 second and after 30 seconds of execution latency has become 10 seconds.
>> I have tried running all processes on the same erlang node, and then latency is (as expected) a couple milliseconds, so my assumption is that the problem could be caused by one (or more) of the following:
>> - Some misconfiguration of the Erlang nodes
>> - Some misconfiguration of the docker network/containers
>> - Some penalty imposed by the operating system/docker because a lot of small messages are exchanged
>> Has anyone encountered this issue, or does anyone know how to configure Erlang nodes (and the operating system) to reduce message latency?
>> Thanks in advance.
>> erlang-questions mailing list
> erlang-questions mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions