[erlang-questions] High latency when exchanging small messages between different Erlang nodes

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Fri Apr 12 11:53:09 CEST 2019


My first recommendation is to add instrumentation to the system, so you can
see what is going on:

* Tristan already suggested looking at mailbox sizes
* Network blocking is worth investigating as well. Many small messages can
lead to network overload situations
* Docker/Kubernetes environments tend to be noisy if a lot of work is
running in them. In particular, if you have high-throughput systems banded
with low latency systems, you are going to run into trouble.
* Enable the Erlang system monitor. Get it to report on blocked ports and
processes.
* Add VM metrics: prometheus for instance.

The problem can be everywhere: Inside your code, the VM, docker, kernel,
hardware, ... Your first goal is to narrow down that. Verify things are
looking correct in each layer before moving to the next.

The fact latency starts out at 1 second where we are at millisecond level
locally, would suggest something has to do with the distribution. Either in
your own code, or in the underlying setup.

On Thu, Apr 11, 2019 at 9:07 PM Konstantinos Kallas <
konstantinos.kallas@REDACTED> wrote:

> Hello,
>
> I have an Erlang application where latency is crucial and a lot of small
> messages (tuples with an atom and integer) are exchanged between processes
> in different nodes.
>
> The main procedure is that a main process sends a small message to 4
> worker processes in other Erlang nodes, the worker processes do some
> negligible processing, and then they reply back to the main node with a
> small message.
>
> Each separate Erlang node is on a different docker container (generated
> from the erlang:21 docker image), and all the containers are connected
> using a standard docker bridge network.
>
> I have noticed that latency (the time from when the first message is sent,
> and its replies arrive) linearly increases with time. It starts at 1 second
> and after 30 seconds of execution latency has become 10 seconds.
>
> I have tried running all processes on the same erlang node, and then
> latency is (as expected) a couple milliseconds, so my assumption is that
> the problem could be caused by one (or more) of the following:
>
> - Some misconfiguration of the Erlang nodes
>
> - Some misconfiguration of the docker network/containers
>
> - Some penalty imposed by the operating system/docker because a lot of
> small messages are exchanged
>
> Has anyone encountered this issue, or does anyone know how to configure
> Erlang nodes (and the operating system) to reduce message latency?
>
> Thanks in advance.
>
> Best,
>
> Konstantinos
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>


-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190412/e3d9b225/attachment.htm>


More information about the erlang-questions mailing list