[erlang-questions] High latency when exchanging small messages between different Erlang nodes

Konstantinos Kallas konstantinos.kallas@REDACTED
Fri Apr 12 15:42:36 CEST 2019


Thanks for the constructive feedback :)

On 12/4/19 9:34 π.μ., Tristan Sloughter wrote:
Yea, instrumentation from the beginning is a good bet. Shameless plug https://opencensus.io/quickstart/erlang/ :) -- and prometheus.erl for vm metrics like Jesper suggests.

Tristan

On Fri, Apr 12, 2019, at 03:53, Jesper Louis Andersen wrote:
My first recommendation is to add instrumentation to the system, so you can see what is going on:

* Tristan already suggested looking at mailbox sizes
* Network blocking is worth investigating as well. Many small messages can lead to network overload situations
* Docker/Kubernetes environments tend to be noisy if a lot of work is running in them. In particular, if you have high-throughput systems banded with low latency systems, you are going to run into trouble.
* Enable the Erlang system monitor. Get it to report on blocked ports and processes.
* Add VM metrics: prometheus for instance.

The problem can be everywhere: Inside your code, the VM, docker, kernel, hardware, ... Your first goal is to narrow down that. Verify things are looking correct in each layer before moving to the next.

The fact latency starts out at 1 second where we are at millisecond level locally, would suggest something has to do with the distribution. Either in your own code, or in the underlying setup.

On Thu, Apr 11, 2019 at 9:07 PM Konstantinos Kallas <konstantinos.kallas@REDACTED<mailto:konstantinos.kallas@REDACTED>> wrote:

Hello,

I have an Erlang application where latency is crucial and a lot of small messages (tuples with an atom and integer) are exchanged between processes in different nodes.

The main procedure is that a main process sends a small message to 4 worker processes in other Erlang nodes, the worker processes do some negligible processing, and then they reply back to the main node with a small message.

Each separate Erlang node is on a different docker container (generated from the erlang:21 docker image), and all the containers are connected using a standard docker bridge network.

I have noticed that latency (the time from when the first message is sent, and its replies arrive) linearly increases with time. It starts at 1 second and after 30 seconds of execution latency has become 10 seconds.

I have tried running all processes on the same erlang node, and then latency is (as expected) a couple milliseconds, so my assumption is that the problem could be caused by one (or more) of the following:

- Some misconfiguration of the Erlang nodes

- Some misconfiguration of the docker network/containers

- Some penalty imposed by the operating system/docker because a lot of small messages are exchanged

Has anyone encountered this issue, or does anyone know how to configure Erlang nodes (and the operating system) to reduce message latency?

Thanks in advance.

Best,

Konstantinos

_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED<mailto:erlang-questions@REDACTED>
http://erlang.org/mailman/listinfo/erlang-questions


--
J.
_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED<mailto:erlang-questions@REDACTED>
http://erlang.org/mailman/listinfo/erlang-questions





_______________________________________________
erlang-questions mailing list
erlang-questions@REDACTED<mailto:erlang-questions@REDACTED>
http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190412/e8ff71a4/attachment.htm>


More information about the erlang-questions mailing list