<div dir="ltr"><div><font face="monospace, monospace">- 2.57% 0.02% 31_scheduler beam.smp [.] process_main </font></div><div><font face="monospace, monospace"> - 2.55% process_main</font></div><div><font face="monospace, monospace"> - 2.43% erts_schedule </font></div><div><font face="monospace, monospace"> - 2.26% erts_port_task_execute</font></div><div><font face="monospace, monospace"> - 2.25% packet_inet_input.isra.31</font></div><div><font face="monospace, monospace"> - 2.05% driver_realloc_binary</font></div><div><font face="monospace, monospace"> - 2.05% realloc_thr_pref</font></div><div><font face="monospace, monospace"> 1.87% __memmove_avx_unaligned_erms</font></div><div><br><font face="arial, helvetica, sans-serif">That's 40-core Xeon E5-2640</font><span style="font-family:arial,helvetica,sans-serif">, so 2.5% on a single scheduler is kind of 100%</span></div><div><font face="arial, helvetica, sans-serif">Also it's Linux kernel 4.9</font><br><br><br><font face="arial, helvetica, sans-serif">On a machine with kernel 4.13 and quad-core Xeon E31225 on a half of E5's load we have:</font><br><div style=""><font face="monospace, monospace">- 16.11% 0.10% 1_scheduler beam.smp [.] erts_schedule </font></div><div style=""><font face="monospace, monospace"> - 16.01% erts_schedule</font></div><div style=""><font face="monospace, monospace"> - 13.62% erts_port_task_execute </font></div><div style=""><font face="monospace, monospace"> - 13.11% packet_inet_input.isra.31</font></div><div style=""><font face="monospace, monospace"> - 11.37% driver_realloc_binary</font></div><div style=""><font face="monospace, monospace"> - 11.33% realloc_thr_pref </font></div><div style=""><font face="monospace, monospace"> - 10.50% __memcpy_avx_unaligned </font></div><div style=""><font face="monospace, monospace"> 5.06% __memcpy_avx_unaligned </font></div><div style=""><font face="monospace, monospace"> + 1.04% page_fault</font></div><div style=""><font face="monospace, monospace"> 0.66% do_erts_alcu_realloc.constprop.31</font></div><div style=""><font face="monospace, monospace"> + 0.79% 0x108f3</font></div><div style=""><font face="monospace, monospace"> 0.55% driver_deliver_term </font></div><div style=""><font face="monospace, monospace"> 1.30% sched_spin_wait</font></div></div><div style=""><font face="monospace, monospace"><br></font></div><div style=""><font face="arial, helvetica, sans-serif">Seems like kernel version may change a lot, will run more tests.</font></div><div style=""><font face="arial, helvetica, sans-serif"><br></font></div><div style=""><font face="arial, helvetica, sans-serif">But it seems like memory operations are <span style="color:rgb(34,34,34);font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">unaligned which could be not very efficient.</span></font></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 24, 2018 at 1:24 PM, Lukas Larsson <span dir="ltr"><<a href="mailto:lukas@erlang.org" target="_blank">lukas@erlang.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Can you run perf with "--call-graph dwarf" and see which functions it is that call memmove?</div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 24, 2018 at 12:21 PM, Danil Zagoskin <span dir="ltr"><<a href="mailto:z@gosk.in" target="_blank">z@gosk.in</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Yes, I've built a fresh master today (Erlang/OTP 21 [RELEASE CANDIDATE 1] [erts-9.3.1]), and nothing has changed.<br></div><div class="gmail_extra"><div><div class="m_-8168820337040500803h5"><br><div class="gmail_quote">On Thu, May 24, 2018 at 1:17 PM, Sergej Jurečko <span dir="ltr"><<a href="mailto:sergej.jurecko@gmail.com" target="_blank">sergej.jurecko@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space">OTP-21 rc1 has enhanced IO scalability. Have you tried if it is any better? UDP performance in Erlang was never great... <div><br></div><div>Regards,</div><div>Sergej<div><div class="m_-8168820337040500803m_6761964062470920450h5"><br><div><br><blockquote type="cite"><div>On 24 May 2018, at 12:03, Danil Zagoskin <<a href="mailto:z@gosk.in" target="_blank">z@gosk.in</a>> wrote:</div><br class="m_-8168820337040500803m_6761964062470920450m_5844314657874226577Apple-interchange-newline"><div><div dir="ltr">Yes, we have {read_packets, 100} in receive socket options.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 24, 2018 at 10:23 AM, Raimo Niskanen <span dir="ltr"><<a href="mailto:raimo+erlang-questions@erix.ericsson.se" target="_blank">raimo+erlang-questions@erix.e<wbr>ricsson.se</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Wed, May 23, 2018 at 06:28:55PM +0300, Danil Zagoskin wrote:<br>
> Hi!<br>
> <br>
> We have a performance problem receiving lots of UDP traffic.<br>
> There are a lot (about 70) of UDP receive processes, each handling about 1<br>
> to 10 megabits of multicast traffic, with {active, N}.<br>
<br>
</span>Whenever someone has UDP receive performance problems one has to ask if you<br>
have seen the Erlang socket option {read_packets,N}?<br>
<br>
See <a href="http://erlang.org/doc/man/inet.html#setopts-2" rel="noreferrer" target="_blank">http://erlang.org/doc/man/inet<wbr>.html#setopts-2</a><br>
<span><br>
> <br>
> msacc summary on my OSX laptop, build from OTP master<br>
> c30309e799212b080c39ee2f91af3f<wbr>9a0383d767 (Apr 19):<br>
> <br>
> <br>
> Thread alloc aux bifbusy_wait check_io emulator<br>
> ets gc gc_full nif other port send sleep<br>
> timers<br>
> scheduler 30.02% 0.92% 2.86% 24.66% 0.01% 9.61%<br>
> 0.03% 1.25% 0.20% 0.13% 2.34% 9.33% 0.41% 17.78%<br>
> 0.44%<br>
> <br>
> <br>
> Linux production server behaves the same way (we do not have extended msacc<br>
> there yet, so most of alloc goes to port).<br>
> <br>
> perf top (on Linux production) says there's a lot of unaligned memmove:<br>
> <br>
> 69.76% <a href="http://libc-2.24.so/" rel="noreferrer" target="_blank">libc-2.24.so</a> [.] __memmove_sse2_unaligned_erms<br>
> 6.13% beam.smp [.] process_main<br>
> 2.02% beam.smp [.] erts_schedule<br>
> 0.87% [kernel] [k] copy_user_enhanced_fast_string<br>
> <br>
> <br>
> I'll try to make a minimal example for this.<br>
> Maybe there are simple recommendations on optimizing this kind of load?<br>
> <br>
> -- <br>
> Danil Zagoskin | <a href="mailto:z@gosk.in" target="_blank">z@gosk.in</a><br>
<br>
</span>> ______________________________<wbr>_________________<br>
> erlang-questions mailing list<br>
> <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
> <a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
<span class="m_-8168820337040500803m_6761964062470920450m_5844314657874226577HOEnZb"><font color="#888888"><br>
<br>
-- <br>
<br>
/ Raimo Niskanen, Erlang/OTP, Ericsson AB<br>
______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="m_-8168820337040500803m_6761964062470920450m_5844314657874226577gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><font face="'courier new', monospace">Danil Zagoskin | <a href="mailto:z@gosk.in" target="_blank">z@gosk.in</a></font></div></div></div>
</div>
______________________________<wbr>_________________<br>erlang-questions mailing list<br><a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br><a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br></div></blockquote></div><br></div></div></div></div></blockquote></div><br><br clear="all"><div><br></div></div></div><span class="m_-8168820337040500803HOEnZb"><font color="#888888">-- <br><div class="m_-8168820337040500803m_6761964062470920450gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><font face="'courier new', monospace">Danil Zagoskin | <a href="mailto:z@gosk.in" target="_blank">z@gosk.in</a></font></div></div></div>
</font></span></div>
<br>______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/list<wbr>info/erlang-questions</a><br>
<br></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><font face="'courier new', monospace">Danil Zagoskin | <a href="mailto:z@gosk.in" target="_blank">z@gosk.in</a></font></div></div></div>
</div>