<div dir="ltr"><p style="box-sizing:border-box;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;margin-top:0px">Hi,</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We are trying to understand what prevents the Erlang distribution link from saturating the network. Even though there is plenty of CPU, memory & network bandwidth, the Erlang distribution doesn't fully utilise available resources. Can you help us figure out why?</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We have a 3-node Erlang 22.0.2 cluster running on Ubuntu 16.04 x86 64bit.</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">This is the maximum network throughput between node-a & node-b, as measured by iperf:</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">iperf -t 30 -c node-b
------------------------------------------------------------
Client connecting to 10.0.1.37, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.1.36 port 43576 connected with 10.0.1.37 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 78.8 GBytes 22.6 Gbits/sec
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We ran this multiple times, in different directions & with different degree of parallelism, the maximum network throughput is roughly 22 Gbit/s.</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We run the following command on node-a:</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">B = fun F() -> rpc:cast('foo@node-b', erlang, is_binary, [<<0:10000000/unit:8>>]), F() end.
[spawn(fun() -> B() end) || _ <- lists:seq(1, 100)].
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">This is what the network reports on node-a:</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">dstat -n 1 10
-net/total-
recv send
0 0
676k 756M
643k 767M
584k 679M
693k 777M
648k 745M
660k 745M
667k 772M
651k 709M
675k 782M
688k 819M
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">That roughly translates to 6 Gbit/s. In other words, the Erlang distribution link between node-a & node-b is maxing out at around ~6 Gbit/s. Erlang distribution is limited to 27% of what we are measuring consistently and repeatedly outside of Erlang. In other words, iperf is 3.6x faster than an Erlang distribution link. It gets better.</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">If we start another 100 processes pumping 10Mbyte messages from node-a to node-c, we see the network throughput double:</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">dstat -n 1 10
-net/total-
recv send
0 0
1303k 1463M
1248k 1360M
1332k 1458M
1480k 1569M
1339k 1455M
1413k 1494M
1395k 1431M
1359k 1514M
1438k 1564M
1379k 1489M
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">So 2 distribution links - each to a separate node - utilise 12Gbit/s out of the 22Gbit/s available on node-a.</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">What is preventing the distribution links pushing more data through? There is plenty of CPU & memory available (all nodes have 16 CPUs & 104GB MEM - n1-highmem-16):</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px;display:inline;overflow:visible;line-height:inherit">dstat -cm 1 10
----total-cpu-usage---- ------memory-usage-----
usr sys idl wai hiq siq| used buff cach free
10 6 84 0 0 1|16.3G 118M 284M 85.6G
20 6 73 0 0 1|16.3G 118M 284M 85.6G
20 6 74 0 0 0|16.3G 118M 284M 85.6G
18 6 76 0 0 0|16.4G 118M 284M 85.5G
19 6 74 0 0 1|16.4G 118M 284M 85.4G
17 4 78 0 0 0|16.5G 118M 284M 85.4G
20 6 74 0 0 0|16.5G 118M 284M 85.4G
19 6 74 0 0 0|16.5G 118M 284M 85.4G
19 5 76 0 0 1|16.5G 118M 284M 85.4G
18 6 75 0 0 0|16.5G 118M 284M 85.4G
18 6 75 0 0 0|16.6G 118M 284M 85.3G
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">The only smoking gun is the distribution output queue buffer: <a href="https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1&fullscreen&panelId=62">https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1&fullscreen&panelId=62</a></p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">Speaking of which, we look forward to erlang/otp#2270 being merged: <a href="https://github.com/erlang/otp/pull/2270">https://github.com/erlang/otp/pull/2270</a></p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">All distribution metrics are available here: <a href="https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1">https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1</a></p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">If you want to see the state of distribution links & dist process state (they are all green btw), check the point-in-time metrics (they will expire in 15 days from today): <a href="https://grafana.gcp.rabbitmq.com/d/d-SFCCmZz/erlang-distribution?from=1560775955127&to=1560779424482">https://grafana.gcp.rabbitmq.com/d/d-SFCCmZz/erlang-distribution?from=1560775955127&to=1560779424482</a></p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">How can we tell what is preventing the distribution link from using all available bandwidth?</p><p style="box-sizing:border-box;margin-top:0px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;margin-bottom:0px">Are we missing a configuration flag? These are all the relevant beam.smp flags that we are using: <a href="https://github.com/erlang/otp/pull/2270#issuecomment-500953352">https://github.com/erlang/otp/pull/2270#issuecomment-500953352</a></p></div>