<div dir="ltr">I wouldn't expect the Erlang distribution to reach the same network performance as iperf, but I would expect it to be within 70%-80% of maximum.<div><br></div><div>In our measurements it's within 27% of maximum, which makes me believe that something is misconfigured or inefficient.</div><div><br></div><div>The goal is to figure out which component/components are responsible for this significant network throughput loss.</div><div><br></div><div>Thanks for the quick response!</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jun 17, 2019 at 4:02 PM Dmytro Lytovchenko <<a href="mailto:dmytro.lytovchenko@gmail.com">dmytro.lytovchenko@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I believe the Erlang distribution is the wrong thing to use if you want to saturate the network.</div><div>There is plenty of overhead for each incoming message, the data gets copied, then encoded (copied again) then sent, then received (copied), then decoded (copied again) and sent to the destination process (copied again). Then the receiving processes might be slow to fetch the incoming data, they aren't running in hard real time and sometimes go to sleep.<br></div><div><br></div><div></div><div>Something about Linux tuning can be googled, like thing here <a href="https://medium.com/@_wmconsulting/tuning-linux-to-reach-maximum-performance-on-10-gbps-network-card-with-http-streaming-8599c9b4389d" target="_blank">https://medium.com/@_wmconsulting/tuning-linux-to-reach-maximum-performance-on-10-gbps-network-card-with-http-streaming-8599c9b4389d</a></div><div><br></div><div>I remember there were suggestions to use regular TCP connections, consider using user-mode driver (kernel calls have a cost) and low level NIF driver for that, with the intent of delivering highest gigabits from your hardware.
</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 17 Jun 2019 at 16:49, Gerhard Lazu <<a href="mailto:gerhard@lazu.co.uk" target="_blank">gerhard@lazu.co.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><p style="box-sizing:border-box;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;margin-top:0px">Hi,</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We are trying to understand what prevents the Erlang distribution link from saturating the network. Even though there is plenty of CPU, memory & network bandwidth, the Erlang distribution doesn't fully utilise available resources. Can you help us figure out why?</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We have a 3-node Erlang 22.0.2 cluster running on Ubuntu 16.04 x86 64bit.</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">This is the maximum network throughput between node-a & node-b, as measured by iperf:</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px none;display:inline;overflow:visible;line-height:inherit">iperf -t 30 -c node-b
------------------------------------------------------------
Client connecting to 10.0.1.37, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.1.36 port 43576 connected with 10.0.1.37 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.0 sec 78.8 GBytes 22.6 Gbits/sec
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We ran this multiple times, in different directions & with different degree of parallelism, the maximum network throughput is roughly 22 Gbit/s.</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">We run the following command on node-a:</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px none;display:inline;overflow:visible;line-height:inherit">B = fun F() -> rpc:cast('foo@node-b', erlang, is_binary, [<<0:10000000/unit:8>>]), F() end.
[spawn(fun() -> B() end) || _ <- lists:seq(1, 100)].
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">This is what the network reports on node-a:</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px none;display:inline;overflow:visible;line-height:inherit">dstat -n 1 10
-net/total-
recv send
0 0
676k 756M
643k 767M
584k 679M
693k 777M
648k 745M
660k 745M
667k 772M
651k 709M
675k 782M
688k 819M
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">That roughly translates to 6 Gbit/s. In other words, the Erlang distribution link between node-a & node-b is maxing out at around ~6 Gbit/s. Erlang distribution is limited to 27% of what we are measuring consistently and repeatedly outside of Erlang. In other words, iperf is 3.6x faster than an Erlang distribution link. It gets better.</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">If we start another 100 processes pumping 10Mbyte messages from node-a to node-c, we see the network throughput double:</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px none;display:inline;overflow:visible;line-height:inherit">dstat -n 1 10
-net/total-
recv send
0 0
1303k 1463M
1248k 1360M
1332k 1458M
1480k 1569M
1339k 1455M
1413k 1494M
1395k 1431M
1359k 1514M
1438k 1564M
1379k 1489M
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">So 2 distribution links - each to a separate node - utilise 12Gbit/s out of the 22Gbit/s available on node-a.</p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">What is preventing the distribution links pushing more data through? There is plenty of CPU & memory available (all nodes have 16 CPUs & 104GB MEM - n1-highmem-16):</p><pre style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:13.6px;margin-top:0px;margin-bottom:16px;padding:16px;overflow:auto;line-height:1.45;background-color:rgb(246,248,250);border-radius:3px;color:rgb(36,41,46)"><code style="box-sizing:border-box;font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;padding:0px;margin:0px;background-color:transparent;border-radius:3px;word-break:normal;border:0px none;display:inline;overflow:visible;line-height:inherit">dstat -cm 1 10
----total-cpu-usage---- ------memory-usage-----
usr sys idl wai hiq siq| used buff cach free
10 6 84 0 0 1|16.3G 118M 284M 85.6G
20 6 73 0 0 1|16.3G 118M 284M 85.6G
20 6 74 0 0 0|16.3G 118M 284M 85.6G
18 6 76 0 0 0|16.4G 118M 284M 85.5G
19 6 74 0 0 1|16.4G 118M 284M 85.4G
17 4 78 0 0 0|16.5G 118M 284M 85.4G
20 6 74 0 0 0|16.5G 118M 284M 85.4G
19 6 74 0 0 0|16.5G 118M 284M 85.4G
19 5 76 0 0 1|16.5G 118M 284M 85.4G
18 6 75 0 0 0|16.5G 118M 284M 85.4G
18 6 75 0 0 0|16.6G 118M 284M 85.3G
</code></pre><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">The only smoking gun is the distribution output queue buffer: <a href="https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1&fullscreen&panelId=62" target="_blank">https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1&fullscreen&panelId=62</a></p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">Speaking of which, we look forward to erlang/otp#2270 being merged: <a href="https://github.com/erlang/otp/pull/2270" target="_blank">https://github.com/erlang/otp/pull/2270</a></p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">All distribution metrics are available here: <a href="https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1" target="_blank">https://grafana.gcp.rabbitmq.com/dashboard/snapshot/H329EfN3SFhsveA20ei7jC7JMFHAm8Ru?orgId=1</a></p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">If you want to see the state of distribution links & dist process state (they are all green btw), check the point-in-time metrics (they will expire in 15 days from today): <a href="https://grafana.gcp.rabbitmq.com/d/d-SFCCmZz/erlang-distribution?from=1560775955127&to=1560779424482" target="_blank">https://grafana.gcp.rabbitmq.com/d/d-SFCCmZz/erlang-distribution?from=1560775955127&to=1560779424482</a></p><p style="box-sizing:border-box;margin-top:0px;margin-bottom:16px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px">How can we tell what is preventing the distribution link from using all available bandwidth?</p><p style="box-sizing:border-box;margin-top:0px;color:rgb(36,41,46);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;margin-bottom:0px">Are we missing a configuration flag? These are all the relevant beam.smp flags that we are using: <a href="https://github.com/erlang/otp/pull/2270#issuecomment-500953352" target="_blank">https://github.com/erlang/otp/pull/2270#issuecomment-500953352</a></p></div>
_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
</blockquote></div>
</blockquote></div>