[erlang-questions] Erlang distribution links don't fully utilise available resources - OTP 22.0.2 - Why?

Mon Jun 17 18:53:43 CEST 2019

> Consider that the VM copies the data at least 4 or 5 times

This is not good.

I'm designing multi-node hardware appliance for our Flussonic and thought
that it would be a good idea to interconnect  nodes via erlang distribution.

Seems that it would be better to create 2 channels: control and data?

On Mon, Jun 17, 2019 at 6:11 PM Dmytro Lytovchenko <
dmytro.lytovchenko@REDACTED> wrote:

> Consider that the VM copies the data at least 4 or 5 times, and compare
> that with Gbit/s of your RAM on both servers too.
> Plus eventual garbage collection, which can be minimal if your VM memory
> footprint is small and you're only doing this perf testing.
>
> On Mon, 17 Jun 2019 at 17:08, Gerhard Lazu <gerhard@REDACTED> wrote:
>
>> I wouldn't expect the Erlang distribution to reach the same network
>> performance as iperf, but I would expect it to be within 70%-80% of maximum.
>>
>> In our measurements it's within 27% of maximum, which makes me believe
>> that something is misconfigured or inefficient.
>>
>> The goal is to figure out which component/components are responsible for
>> this significant network throughput loss.
>>
>> Thanks for the quick response!
>>
>> On Mon, Jun 17, 2019 at 4:02 PM Dmytro Lytovchenko <
>> dmytro.lytovchenko@REDACTED> wrote:
>>
>>> I believe the Erlang distribution is the wrong thing to use if you want
>>> to saturate the network.
>>> There is plenty of overhead for each incoming message, the data gets
>>> copied, then encoded (copied again) then sent, then received (copied), then
>>> decoded (copied again) and sent to the destination process (copied again).
>>> Then the receiving processes might be slow to fetch the incoming data, they
>>> aren't running in hard real time and sometimes go to sleep.
>>>
>>> Something about Linux tuning can be googled, like thing here
>>> https://medium.com/@_wmconsulting/tuning-linux-to-reach-maximum-performance-on-10-gbps-network-card-with-http-streaming-8599c9b4389d
>>>
>>> I remember there were suggestions to use regular TCP connections,
>>> consider using user-mode driver (kernel calls have a cost) and low level
>>> NIF driver for that, with the intent of delivering highest gigabits from
>>> your hardware.
>>>
>>> On Mon, 17 Jun 2019 at 16:49, Gerhard Lazu <gerhard@REDACTED> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are trying to understand what prevents the Erlang distribution link
>>>> from saturating the network. Even though there is plenty of CPU, memory &
>>>> network bandwidth, the Erlang distribution doesn't fully utilise available
>>>> resources. Can you help us figure out why?
>>>>
>>>> We have a 3-node Erlang 22.0.2 cluster running on Ubuntu 16.04 x86
>>>> 64bit.
>>>>
>>>> This is the maximum network throughput between node-a & node-b, as
>>>> measured by iperf:
>>>>
>>>> iperf -t 30 -c node-b
>>>> ------------------------------------------------------------
>>>> Client connecting to 10.0.1.37, TCP port 5001
>>>> TCP window size: 45.0 KByte (default)
>>>> ------------------------------------------------------------
>>>> [  3] local 10.0.1.36 port 43576 connected with 10.0.1.37 port 5001
>>>> [ ID] Interval       Transfer     Bandwidth
>>>> [  3]  0.0-30.0 sec  78.8 GBytes  22.6 Gbits/sec
>>>>
>>>> We ran this multiple times, in different directions & with different
>>>> degree of parallelism, the maximum network throughput is roughly 22 Gbit/s.
>>>>
>>>> We run the following command on node-a:
>>>>
>>>> B = fun F() -> rpc:cast('foo@REDACTED', erlang, is_binary, [<<0:10000000/unit:8>>]), F() end.
>>>> [spawn(fun() -> B() end) || _ <- lists:seq(1, 100)].
>>>>
>>>> This is what the network reports on node-a:
>>>>
>>>> dstat -n 1 10
>>>> -net/total-
>>>>  recv  send
>>>>    0     0
>>>>  676k  756M
>>>>  643k  767M
>>>>  584k  679M
>>>>  693k  777M
>>>>  648k  745M
>>>>  660k  745M
>>>>  667k  772M
>>>>  651k  709M
>>>>  675k  782M
>>>>  688k  819M
>>>>
>>>> That roughly translates to 6 Gbit/s. In other words, the Erlang
>>>> distribution link between node-a & node-b is maxing out at around ~6
>>>> Gbit/s. Erlang distribution is limited to 27% of what we are measuring
>>>> consistently and repeatedly outside of Erlang. In other words, iperf is
>>>> 3.6x faster than an Erlang distribution link. It gets better.
>>>>
>>>> If we start another 100 processes pumping 10Mbyte messages from node-a
>>>> to node-c, we see the network throughput double:
>>>>
>>>> dstat -n 1 10
>>>> -net/total-
>>>>  recv  send
>>>>    0     0
>>>> 1303k 1463M
>>>> 1248k 1360M
>>>> 1332k 1458M
>>>> 1480k 1569M
>>>> 1339k 1455M
>>>> 1413k 1494M
>>>> 1395k 1431M
>>>> 1359k 1514M
>>>> 1438k 1564M
>>>> 1379k 1489M
>>>>
>>>> So 2 distribution links - each to a separate node - utilise 12Gbit/s
>>>> out of the 22Gbit/s available on node-a.
>>>>
>>>> What is preventing the distribution links pushing more data through?
>>>> There is plenty of CPU & memory available (all nodes have 16 CPUs & 104GB
>>>> MEM - n1-highmem-16):
>>>>
>>>> dstat -cm 1 10
>>>> ----total-cpu-usage---- ------memory-usage-----
>>>> usr sys idl wai hiq siq| used  buff  cach  free
>>>>  10   6  84   0   0   1|16.3G  118M  284M 85.6G
>>>>  20   6  73   0   0   1|16.3G  118M  284M 85.6G
>>>>  20   6  74   0   0   0|16.3G  118M  284M 85.6G
>>>>  18   6  76   0   0   0|16.4G  118M  284M 85.5G
>>>>  19   6  74   0   0   1|16.4G  118M  284M 85.4G
>>>>  17   4  78   0   0   0|16.5G  118M  284M 85.4G
>>>>  20   6  74   0   0   0|16.5G  118M  284M 85.4G
>>>>  19   6  74   0   0   0|16.5G  118M  284M 85.4G
>>>>  19   5  76   0   0   1|16.5G  118M  284M 85.4G
>>>>  18   6  75   0   0   0|16.5G  118M  284M 85.4G
>>>>  18   6  75   0   0   0|16.6G  118M  284M 85.3G
>>>>
>>>> The only smoking gun is the distribution output queue buffer:
>>>> https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1&fullscreen&panelId=62
>>>>
>>>> Speaking of which, we look forward to erlang/otp#2270 being merged:
>>>> https://github.com/erlang/otp/pull/2270
>>>>
>>>> All distribution metrics are available here:
>>>> https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1
>>>>
>>>> If you want to see the state of distribution links & dist process state
>>>> (they are all green btw), check the point-in-time metrics (they will expire
>>>> in 15 days from today):
>>>> https://grafana.gcp.rabbitmq.com/d/d-SFCCmZz/erlang-distribution?from=1560775955127&to=1560779424482
>>>>
>>>> How can we tell what is preventing the distribution link from using all
>>>> available bandwidth?
>>>>
>>>> Are we missing a configuration flag? These are all the relevant
>>>> beam.smp flags that we are using:
>>>> https://github.com/erlang/otp/pull/2270#issuecomment-500953352
>>>> _______________________________________________
>>>> erlang-questions mailing list
>>>> erlang-questions@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>>
>>> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190617/6cc22522/attachment.htm>