[erlang-questions] High latency when exchanging small messages between different Erlang nodes

Tristan Sloughter t@REDACTED
Fri Apr 12 15:34:28 CEST 2019


Yea, instrumentation from the beginning is a good bet. Shameless plug https://opencensus.io/quickstart/erlang/ :) -- and prometheus.erl for vm metrics like Jesper suggests.

Tristan

On Fri, Apr 12, 2019, at 03:53, Jesper Louis Andersen wrote:
> My first recommendation is to add instrumentation to the system, so you can see what is going on:
> 
> * Tristan already suggested looking at mailbox sizes
> * Network blocking is worth investigating as well. Many small messages can lead to network overload situations
> * Docker/Kubernetes environments tend to be noisy if a lot of work is running in them. In particular, if you have high-throughput systems banded with low latency systems, you are going to run into trouble.
> * Enable the Erlang system monitor. Get it to report on blocked ports and processes.
> * Add VM metrics: prometheus for instance.
> 
> The problem can be everywhere: Inside your code, the VM, docker, kernel, hardware, ... Your first goal is to narrow down that. Verify things are looking correct in each layer before moving to the next.
> 
> The fact latency starts out at 1 second where we are at millisecond level locally, would suggest something has to do with the distribution. Either in your own code, or in the underlying setup.
> 
> On Thu, Apr 11, 2019 at 9:07 PM Konstantinos Kallas <konstantinos.kallas@REDACTED> wrote:
>> Hello,

>> I have an Erlang application where latency is crucial and a lot of small messages (tuples with an atom and integer) are exchanged between processes in different nodes. 

>> The main procedure is that a main process sends a small message to 4 worker processes in other Erlang nodes, the worker processes do some negligible processing, and then they reply back to the main node with a small message. 

>> Each separate Erlang node is on a different docker container (generated from the erlang:21 docker image), and all the containers are connected using a standard docker bridge network.

>> I have noticed that latency (the time from when the first message is sent, and its replies arrive) linearly increases with time. It starts at 1 second and after 30 seconds of execution latency has become 10 seconds.

>> I have tried running all processes on the same erlang node, and then latency is (as expected) a couple milliseconds, so my assumption is that the problem could be caused by one (or more) of the following:

>> - Some misconfiguration of the Erlang nodes

>> - Some misconfiguration of the docker network/containers

>> - Some penalty imposed by the operating system/docker because a lot of small messages are exchanged

>> Has anyone encountered this issue, or does anyone know how to configure Erlang nodes (and the operating system) to reduce message latency? 

>> Thanks in advance.

>> Best,

>> Konstantinos

>> _______________________________________________
>>  erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
> 
> 
> -- 
> J.
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190412/a98c091b/attachment.htm>


More information about the erlang-questions mailing list