[erlang-questions] Erlang distribution links don't fully utilise available resources - OTP 22.0.2 - Why?

Gerhard Lazu gerhard@REDACTED
Mon Jun 17 17:07:41 CEST 2019


I wouldn't expect the Erlang distribution to reach the same network
performance as iperf, but I would expect it to be within 70%-80% of maximum.

In our measurements it's within 27% of maximum, which makes me believe that
something is misconfigured or inefficient.

The goal is to figure out which component/components are responsible for
this significant network throughput loss.

Thanks for the quick response!

On Mon, Jun 17, 2019 at 4:02 PM Dmytro Lytovchenko <
dmytro.lytovchenko@REDACTED> wrote:

> I believe the Erlang distribution is the wrong thing to use if you want to
> saturate the network.
> There is plenty of overhead for each incoming message, the data gets
> copied, then encoded (copied again) then sent, then received (copied), then
> decoded (copied again) and sent to the destination process (copied again).
> Then the receiving processes might be slow to fetch the incoming data, they
> aren't running in hard real time and sometimes go to sleep.
>
> Something about Linux tuning can be googled, like thing here
> https://medium.com/@_wmconsulting/tuning-linux-to-reach-maximum-performance-on-10-gbps-network-card-with-http-streaming-8599c9b4389d
>
> I remember there were suggestions to use regular TCP connections, consider
> using user-mode driver (kernel calls have a cost) and low level NIF driver
> for that, with the intent of delivering highest gigabits from your
> hardware.
>
> On Mon, 17 Jun 2019 at 16:49, Gerhard Lazu <gerhard@REDACTED> wrote:
>
>> Hi,
>>
>> We are trying to understand what prevents the Erlang distribution link
>> from saturating the network. Even though there is plenty of CPU, memory &
>> network bandwidth, the Erlang distribution doesn't fully utilise available
>> resources. Can you help us figure out why?
>>
>> We have a 3-node Erlang 22.0.2 cluster running on Ubuntu 16.04 x86 64bit.
>>
>> This is the maximum network throughput between node-a & node-b, as
>> measured by iperf:
>>
>> iperf -t 30 -c node-b
>> ------------------------------------------------------------
>> Client connecting to 10.0.1.37, TCP port 5001
>> TCP window size: 45.0 KByte (default)
>> ------------------------------------------------------------
>> [  3] local 10.0.1.36 port 43576 connected with 10.0.1.37 port 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0-30.0 sec  78.8 GBytes  22.6 Gbits/sec
>>
>> We ran this multiple times, in different directions & with different
>> degree of parallelism, the maximum network throughput is roughly 22 Gbit/s.
>>
>> We run the following command on node-a:
>>
>> B = fun F() -> rpc:cast('foo@REDACTED', erlang, is_binary, [<<0:10000000/unit:8>>]), F() end.
>> [spawn(fun() -> B() end) || _ <- lists:seq(1, 100)].
>>
>> This is what the network reports on node-a:
>>
>> dstat -n 1 10
>> -net/total-
>>  recv  send
>>    0     0
>>  676k  756M
>>  643k  767M
>>  584k  679M
>>  693k  777M
>>  648k  745M
>>  660k  745M
>>  667k  772M
>>  651k  709M
>>  675k  782M
>>  688k  819M
>>
>> That roughly translates to 6 Gbit/s. In other words, the Erlang
>> distribution link between node-a & node-b is maxing out at around ~6
>> Gbit/s. Erlang distribution is limited to 27% of what we are measuring
>> consistently and repeatedly outside of Erlang. In other words, iperf is
>> 3.6x faster than an Erlang distribution link. It gets better.
>>
>> If we start another 100 processes pumping 10Mbyte messages from node-a to
>> node-c, we see the network throughput double:
>>
>> dstat -n 1 10
>> -net/total-
>>  recv  send
>>    0     0
>> 1303k 1463M
>> 1248k 1360M
>> 1332k 1458M
>> 1480k 1569M
>> 1339k 1455M
>> 1413k 1494M
>> 1395k 1431M
>> 1359k 1514M
>> 1438k 1564M
>> 1379k 1489M
>>
>> So 2 distribution links - each to a separate node - utilise 12Gbit/s out
>> of the 22Gbit/s available on node-a.
>>
>> What is preventing the distribution links pushing more data through?
>> There is plenty of CPU & memory available (all nodes have 16 CPUs & 104GB
>> MEM - n1-highmem-16):
>>
>> dstat -cm 1 10
>> ----total-cpu-usage---- ------memory-usage-----
>> usr sys idl wai hiq siq| used  buff  cach  free
>>  10   6  84   0   0   1|16.3G  118M  284M 85.6G
>>  20   6  73   0   0   1|16.3G  118M  284M 85.6G
>>  20   6  74   0   0   0|16.3G  118M  284M 85.6G
>>  18   6  76   0   0   0|16.4G  118M  284M 85.5G
>>  19   6  74   0   0   1|16.4G  118M  284M 85.4G
>>  17   4  78   0   0   0|16.5G  118M  284M 85.4G
>>  20   6  74   0   0   0|16.5G  118M  284M 85.4G
>>  19   6  74   0   0   0|16.5G  118M  284M 85.4G
>>  19   5  76   0   0   1|16.5G  118M  284M 85.4G
>>  18   6  75   0   0   0|16.5G  118M  284M 85.4G
>>  18   6  75   0   0   0|16.6G  118M  284M 85.3G
>>
>> The only smoking gun is the distribution output queue buffer:
>> https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1&fullscreen&panelId=62
>>
>> Speaking of which, we look forward to erlang/otp#2270 being merged:
>> https://github.com/erlang/otp/pull/2270
>>
>> All distribution metrics are available here:
>> https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1
>>
>> If you want to see the state of distribution links & dist process state
>> (they are all green btw), check the point-in-time metrics (they will expire
>> in 15 days from today):
>> https://grafana.gcp.rabbitmq.com/d/d-SFCCmZz/erlang-distribution?from=1560775955127&to=1560779424482
>>
>> How can we tell what is preventing the distribution link from using all
>> available bandwidth?
>>
>> Are we missing a configuration flag? These are all the relevant beam.smp
>> flags that we are using:
>> https://github.com/erlang/otp/pull/2270#issuecomment-500953352
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190617/d5479a80/attachment.htm>


More information about the erlang-questions mailing list