[erlang-questions] UDP receive performance
Danil Zagoskin
z@REDACTED
Thu Jun 21 15:49:17 CEST 2018
Hi!
Sorry for the late reply.
The patch works like a charm. Huge performance boost!
Before:
Thread alloc aux bifbusy_wait check_io emulator
ets gc gc_full nif other port send sleep
timers
Stats per type:
async 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
0.00%
aux 0.04% 0.05% 0.00% 0.00% 2.02% 0.00%
0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 97.88%
0.00%
dirty_cpu_sche 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
0.00%
dirty_io_sched 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
0.00%
poll 0.04% 0.00% 0.00% 0.00% 6.03% 0.00%
0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 93.93%
0.00%
scheduler 56.63% 0.80% 0.48% 16.13% 0.00% 4.07%
0.00% 1.24% 0.00% 0.00% 2.69% 14.89% 0.00% 2.80%
0.25%
After:
Thread alloc aux bifbusy_wait check_io emulator
ets gc gc_full nif other port send sleep
timers
Stats per type:
async 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
0.00%
aux 0.00% 0.00% 0.00% 0.00% 14.57% 0.00%
0.00% 0.00% 0.00% 0.00% 1.24% 0.00% 0.00% 84.19%
0.00%
dirty_cpu_sche 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
0.00%
dirty_io_sched 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
0.00%
poll 0.00% 0.00% 0.00% 0.00% 3.10% 0.00%
0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 96.90%
0.00%
scheduler 2.86% 0.78% 0.38% 57.22% 0.00% 2.81%
0.00% 0.49% 0.00% 0.00% 7.32% 9.87% 0.00% 18.08%
0.18%
The machine is quad-core i5-6400@REDACTED with Linux 4.13.0
Load generated with this commands (10 senders, 10 readers on each address):
[udptest:start_sender({239,9,9,X}, 3999) || X <- lists:seq(1,10)],
[udptest:start_reader({239,9,9,X}, 3999) || X <- lists:seq(1,10), _ <-
lists:seq(1, 10)].
Thank you for the awesome work!
P.S. For some reason there is an increased check_io time in aux thread, but
I think it's not critical and may be related to more often socket reads due
to lots of idle time.
On Wed, Jun 20, 2018 at 2:56 PM Jonas Falkevik <
jonas.falkevik@REDACTED> wrote:
>
> On Mon, Jun 18, 2018 at 3:57 PM, Lukas Larsson <lukas@REDACTED> wrote:
>
>>
>>
>>>> Something like
>>> https://github.com/falkevik/otp/commit/444fb00ff2a9d1f40a8c66f48bea1cf3f07ca86c
>>> ?
>>>
>>
>> I was thinking something more like this:
>> https://github.com/garazdawi/otp/tree/lukas/erts/realloc_thr_pref_shrink_th
>>
>> This makes it so that the absolute and relative single block shrink
>> thesholds are respected for reallocs made on a remote scheduler. This
>> should solve the problem that you have found also, through as I still
>> haven't reproduced it I can't test that it actually solves it.
>>
>>
> Are you adding the multicast network to the loop back interface? Using
> some other interface that does not allow multicast traffic?
>
> I have been able to reproduce it in macOS Sierra (Darwin Kernel Version
> 16.7.0: Fri Apr 27 17:59:46 PDT 2018; root:xnu-3789.73.13~1/RELEASE_X86_64
> x86_64)
> and also on Ubuntu Linux 16.04 LTS with kernel 4.4.0-112-generic.
>
> Danil Zagoskin, do you have time to try and see if there is any change in
> the behaviour for you with a patched system?
>
>
> With the bugfix, the beam is spending most of its time in sched_spin_wait
> on linux and sched_yield on macOS.
>
> Stats below are from OTP 21 and then OTP21 with bugfix on linux
>
> Erlang/OTP 21 [erts-10.0] [source] [64-bit] [smp:8:8] [ds:8:8:10]
> [async-threads:1] [hipe]
>
> Eshell V10.0 (abort with ^G)
> 1> udptest:start_sender({239,9,9,9}, 3999).
> <0.78.0>
> 2> [udptest:start_reader({239,9,9,9}, 3999) || _ <- lists:seq(1, 40)].
> [<0.80.0>,<0.81.0>,<0.82.0>,<0.83.0>,<0.84.0>,<0.85.0>,
> <0.86.0>,<0.87.0>,<0.88.0>,<0.89.0>,<0.90.0>,<0.91.0>,
> <0.92.0>,<0.93.0>,<0.94.0>,<0.95.0>,<0.96.0>,<0.97.0>,
> <0.98.0>,<0.99.0>,<0.100.0>,<0.101.0>,<0.102.0>,<0.103.0>,
> <0.104.0>,<0.105.0>,<0.106.0>,<0.107.0>,<0.108.0>|...]
> 3> msacc:start(10000), msacc:print().
> Average thread real-time : 10002138 us
> Accumulated system run-time : 62576106 us
> Average scheduler run-time : 7775309 us
>
> Thread aux check_io emulator gc other port
> sleep
>
> Stats per thread:
> async( 0) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> aux( 1) 0.06% 0.96% 0.00% 0.00% 0.07% 0.00%
> 98.90%
> dirty_cpu_( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 6) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 7) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 6) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 7) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 9) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s(10) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> poll( 0) 0.00% 2.63% 0.00% 0.00% 0.00% 0.00%
> 97.37%
> scheduler( 1) 0.82% 0.00% 0.52% 0.35% 7.33% 67.91%
> 23.08%
> scheduler( 2) 0.76% 0.00% 0.79% 0.34% 7.30% 68.61%
> 22.21%
> scheduler( 3) 0.74% 0.00% 0.83% 0.33% 7.17% 68.54%
> 22.38%
> scheduler( 4) 0.76% 0.00% 0.72% 0.34% 7.38% 68.84%
> 21.96%
> scheduler( 5) 0.79% 0.00% 0.82% 0.34% 7.21% 68.94%
> 21.90%
> scheduler( 6) 0.80% 0.00% 0.74% 0.31% 7.15% 68.56%
> 22.44%
> scheduler( 7) 0.87% 0.00% 0.72% 0.37% 7.24% 68.43%
> 22.36%
> scheduler( 8) 0.77% 0.00% 0.68% 0.35% 7.41% 69.02%
> 21.77%
>
> Stats per type:
> async 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> aux 0.06% 0.96% 0.00% 0.00% 0.07% 0.00%
> 98.90%
> dirty_cpu_sche 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_sched 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> poll 0.00% 2.63% 0.00% 0.00% 0.00% 0.00%
> 97.37%
> scheduler 0.79% 0.00% 0.73% 0.34% 7.27% 68.61%
> 22.26%
> ok
> 4>
>
>
> $ ../src/udp_performace_bug_fix/bin/erl
> Erlang/OTP 21 [RELEASE CANDIDATE 2] [erts-9.3.2] [source] [64-bit]
> [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]
>
> Eshell V9.3.2 (abort with ^G)
> 1> udptest:start_sender({239,9,9,9}, 3999).
> <0.78.0>
> 2> [udptest:start_reader({239,9,9,9}, 3999) || _ <- lists:seq(1, 40)].
> [<0.80.0>,<0.81.0>,<0.82.0>,<0.83.0>,<0.84.0>,<0.85.0>,
> <0.86.0>,<0.87.0>,<0.88.0>,<0.89.0>,<0.90.0>,<0.91.0>,
> <0.92.0>,<0.93.0>,<0.94.0>,<0.95.0>,<0.96.0>,<0.97.0>,
> <0.98.0>,<0.99.0>,<0.100.0>,<0.101.0>,<0.102.0>,<0.103.0>,
> <0.104.0>,<0.105.0>,<0.106.0>,<0.107.0>,<0.108.0>|...]
> 3> msacc:start(10000), msacc:print().
> Average thread real-time : 10000416 us
> Accumulated system run-time : 19541445 us
> Average scheduler run-time : 2279822 us
>
> Thread aux check_io emulator gc other port
> sleep
>
> Stats per thread:
> async( 0) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> aux( 1) 0.00% 10.72% 0.00% 0.00% 0.00% 0.00%
> 89.28%
> dirty_cpu_( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 6) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 7) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_cpu_( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 6) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 7) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s( 9) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_s(10) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> poll( 0) 0.00% 2.31% 0.00% 0.00% 0.00% 0.00%
> 97.69%
> scheduler( 1) 0.46% 0.00% 0.43% 0.14% 17.57% 4.07%
> 77.33%
> scheduler( 2) 0.50% 0.00% 0.44% 0.15% 17.13% 4.21%
> 77.57%
> scheduler( 3) 0.49% 0.00% 0.47% 0.16% 17.07% 4.35%
> 77.46%
> scheduler( 4) 0.45% 0.00% 0.42% 0.14% 16.25% 4.02%
> 78.72%
> scheduler( 5) 0.48% 0.00% 0.42% 0.14% 17.04% 4.04%
> 77.88%
> scheduler( 6) 0.46% 0.00% 0.39% 0.14% 16.57% 3.77%
> 78.67%
> scheduler( 7) 0.53% 0.00% 1.88% 0.27% 17.60% 8.22%
> 71.50%
> scheduler( 8) 0.47% 0.00% 0.42% 0.15% 16.60% 3.86%
> 78.50%
>
> Stats per type:
> async 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> aux 0.00% 10.72% 0.00% 0.00% 0.00% 0.00%
> 89.28%
> dirty_cpu_sche 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> dirty_io_sched 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
> 100.00%
> poll 0.00% 2.31% 0.00% 0.00% 0.00% 0.00%
> 97.69%
> scheduler 0.48% 0.00% 0.61% 0.16% 16.98% 4.57%
> 77.20%
> ok
>
> Jonas
>
>
--
Danil Zagoskin | z@REDACTED
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180621/b0affc54/attachment.htm>
More information about the erlang-questions
mailing list