[erlang-questions] Erlang distribution links don't fully utilise available resources - OTP 22.0.2 - Why?
Gerhard Lazu
gerhard@REDACTED
Mon Jun 17 16:48:45 CEST 2019
Hi,
We are trying to understand what prevents the Erlang distribution link from
saturating the network. Even though there is plenty of CPU, memory &
network bandwidth, the Erlang distribution doesn't fully utilise available
resources. Can you help us figure out why?
We have a 3-node Erlang 22.0.2 cluster running on Ubuntu 16.04 x86 64bit.
This is the maximum network throughput between node-a & node-b, as measured
by iperf:
iperf -t 30 -c node-b
------------------------------------------------------------
Client connecting to 10.0.1.37, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.1.36 port 43576 connected with 10.0.1.37 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 78.8 GBytes 22.6 Gbits/sec
We ran this multiple times, in different directions & with different degree
of parallelism, the maximum network throughput is roughly 22 Gbit/s.
We run the following command on node-a:
B = fun F() -> rpc:cast('foo@REDACTED', erlang, is_binary,
[<<0:10000000/unit:8>>]), F() end.
[spawn(fun() -> B() end) || _ <- lists:seq(1, 100)].
This is what the network reports on node-a:
dstat -n 1 10
-net/total-
recv send
0 0
676k 756M
643k 767M
584k 679M
693k 777M
648k 745M
660k 745M
667k 772M
651k 709M
675k 782M
688k 819M
That roughly translates to 6 Gbit/s. In other words, the Erlang
distribution link between node-a & node-b is maxing out at around ~6
Gbit/s. Erlang distribution is limited to 27% of what we are measuring
consistently and repeatedly outside of Erlang. In other words, iperf is
3.6x faster than an Erlang distribution link. It gets better.
If we start another 100 processes pumping 10Mbyte messages from node-a to
node-c, we see the network throughput double:
dstat -n 1 10
-net/total-
recv send
0 0
1303k 1463M
1248k 1360M
1332k 1458M
1480k 1569M
1339k 1455M
1413k 1494M
1395k 1431M
1359k 1514M
1438k 1564M
1379k 1489M
So 2 distribution links - each to a separate node - utilise 12Gbit/s out of
the 22Gbit/s available on node-a.
What is preventing the distribution links pushing more data through? There
is plenty of CPU & memory available (all nodes have 16 CPUs & 104GB MEM -
n1-highmem-16):
dstat -cm 1 10
----total-cpu-usage---- ------memory-usage-----
usr sys idl wai hiq siq| used buff cach free
10 6 84 0 0 1|16.3G 118M 284M 85.6G
20 6 73 0 0 1|16.3G 118M 284M 85.6G
20 6 74 0 0 0|16.3G 118M 284M 85.6G
18 6 76 0 0 0|16.4G 118M 284M 85.5G
19 6 74 0 0 1|16.4G 118M 284M 85.4G
17 4 78 0 0 0|16.5G 118M 284M 85.4G
20 6 74 0 0 0|16.5G 118M 284M 85.4G
19 6 74 0 0 0|16.5G 118M 284M 85.4G
19 5 76 0 0 1|16.5G 118M 284M 85.4G
18 6 75 0 0 0|16.5G 118M 284M 85.4G
18 6 75 0 0 0|16.6G 118M 284M 85.3G
The only smoking gun is the distribution output queue buffer:
https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1&fullscreen&panelId=62
Speaking of which, we look forward to erlang/otp#2270 being merged:
https://github.com/erlang/otp/pull/2270
All distribution metrics are available here:
https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1
If you want to see the state of distribution links & dist process state
(they are all green btw), check the point-in-time metrics (they will expire
in 15 days from today):
https://grafana.gcp.rabbitmq.com/d/d-SFCCmZz/erlang-distribution?from=1560775955127&to=1560779424482
How can we tell what is preventing the distribution link from using all
available bandwidth?
Are we missing a configuration flag? These are all the relevant beam.smp
flags that we are using:
https://github.com/erlang/otp/pull/2270#issuecomment-500953352
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190617/6f6f313c/attachment.htm>
More information about the erlang-questions
mailing list