[erlang-questions] Erlang distribution links don't fully utilise available resources - OTP 22.0.2 - Why?
Dmytro Lytovchenko
dmytro.lytovchenko@REDACTED
Mon Jun 17 17:02:06 CEST 2019
I believe the Erlang distribution is the wrong thing to use if you want to
saturate the network.
There is plenty of overhead for each incoming message, the data gets
copied, then encoded (copied again) then sent, then received (copied), then
decoded (copied again) and sent to the destination process (copied again).
Then the receiving processes might be slow to fetch the incoming data, they
aren't running in hard real time and sometimes go to sleep.
Something about Linux tuning can be googled, like thing here
https://medium.com/@_wmconsulting/tuning-linux-to-reach-maximum-performance-on-10-gbps-network-card-with-http-streaming-8599c9b4389d
I remember there were suggestions to use regular TCP connections, consider
using user-mode driver (kernel calls have a cost) and low level NIF driver
for that, with the intent of delivering highest gigabits from your
hardware.
On Mon, 17 Jun 2019 at 16:49, Gerhard Lazu <gerhard@REDACTED> wrote:
> Hi,
>
> We are trying to understand what prevents the Erlang distribution link
> from saturating the network. Even though there is plenty of CPU, memory &
> network bandwidth, the Erlang distribution doesn't fully utilise available
> resources. Can you help us figure out why?
>
> We have a 3-node Erlang 22.0.2 cluster running on Ubuntu 16.04 x86 64bit.
>
> This is the maximum network throughput between node-a & node-b, as
> measured by iperf:
>
> iperf -t 30 -c node-b
> ------------------------------------------------------------
> Client connecting to 10.0.1.37, TCP port 5001
> TCP window size: 45.0 KByte (default)
> ------------------------------------------------------------
> [ 3] local 10.0.1.36 port 43576 connected with 10.0.1.37 port 5001
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0-30.0 sec 78.8 GBytes 22.6 Gbits/sec
>
> We ran this multiple times, in different directions & with different
> degree of parallelism, the maximum network throughput is roughly 22 Gbit/s.
>
> We run the following command on node-a:
>
> B = fun F() -> rpc:cast('foo@REDACTED', erlang, is_binary, [<<0:10000000/unit:8>>]), F() end.
> [spawn(fun() -> B() end) || _ <- lists:seq(1, 100)].
>
> This is what the network reports on node-a:
>
> dstat -n 1 10
> -net/total-
> recv send
> 0 0
> 676k 756M
> 643k 767M
> 584k 679M
> 693k 777M
> 648k 745M
> 660k 745M
> 667k 772M
> 651k 709M
> 675k 782M
> 688k 819M
>
> That roughly translates to 6 Gbit/s. In other words, the Erlang
> distribution link between node-a & node-b is maxing out at around ~6
> Gbit/s. Erlang distribution is limited to 27% of what we are measuring
> consistently and repeatedly outside of Erlang. In other words, iperf is
> 3.6x faster than an Erlang distribution link. It gets better.
>
> If we start another 100 processes pumping 10Mbyte messages from node-a to
> node-c, we see the network throughput double:
>
> dstat -n 1 10
> -net/total-
> recv send
> 0 0
> 1303k 1463M
> 1248k 1360M
> 1332k 1458M
> 1480k 1569M
> 1339k 1455M
> 1413k 1494M
> 1395k 1431M
> 1359k 1514M
> 1438k 1564M
> 1379k 1489M
>
> So 2 distribution links - each to a separate node - utilise 12Gbit/s out
> of the 22Gbit/s available on node-a.
>
> What is preventing the distribution links pushing more data through? There
> is plenty of CPU & memory available (all nodes have 16 CPUs & 104GB MEM -
> n1-highmem-16):
>
> dstat -cm 1 10
> ----total-cpu-usage---- ------memory-usage-----
> usr sys idl wai hiq siq| used buff cach free
> 10 6 84 0 0 1|16.3G 118M 284M 85.6G
> 20 6 73 0 0 1|16.3G 118M 284M 85.6G
> 20 6 74 0 0 0|16.3G 118M 284M 85.6G
> 18 6 76 0 0 0|16.4G 118M 284M 85.5G
> 19 6 74 0 0 1|16.4G 118M 284M 85.4G
> 17 4 78 0 0 0|16.5G 118M 284M 85.4G
> 20 6 74 0 0 0|16.5G 118M 284M 85.4G
> 19 6 74 0 0 0|16.5G 118M 284M 85.4G
> 19 5 76 0 0 1|16.5G 118M 284M 85.4G
> 18 6 75 0 0 0|16.5G 118M 284M 85.4G
> 18 6 75 0 0 0|16.6G 118M 284M 85.3G
>
> The only smoking gun is the distribution output queue buffer:
> https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1&fullscreen&panelId=62
>
> Speaking of which, we look forward to erlang/otp#2270 being merged:
> https://github.com/erlang/otp/pull/2270
>
> All distribution metrics are available here:
> https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1
>
> If you want to see the state of distribution links & dist process state
> (they are all green btw), check the point-in-time metrics (they will expire
> in 15 days from today):
> https://grafana.gcp.rabbitmq.com/d/d-SFCCmZz/erlang-distribution?from=1560775955127&to=1560779424482
>
> How can we tell what is preventing the distribution link from using all
> available bandwidth?
>
> Are we missing a configuration flag? These are all the relevant beam.smp
> flags that we are using:
> https://github.com/erlang/otp/pull/2270#issuecomment-500953352
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190617/6c9b409c/attachment.htm>
More information about the erlang-questions
mailing list