[erlang-questions] UDP receive performance

Lukas Larsson lukas@REDACTED
Fri May 25 11:13:06 CEST 2018


On Thu, May 24, 2018 at 5:34 PM, Danil Zagoskin <z@REDACTED> wrote:

> -    2.57%     0.02%  31_scheduler     beam.smp
> [.] process_main
>    - 2.55% process_main
>       - 2.43% erts_schedule
>          - 2.26% erts_port_task_execute
>             - 2.25% packet_inet_input.isra.31
>                - 2.05% driver_realloc_binary
>                   - 2.05% realloc_thr_pref
>                        1.87% __memmove_avx_unaligned_erms
>
> That's 40-core Xeon E5-2640, so 2.5% on a single scheduler is kind of 100%
> Also it's Linux kernel 4.9
>
>
> On a machine with kernel 4.13 and quad-core Xeon E31225 on a half of E5's
> load we have:
> -   16.11%     0.10%  1_scheduler      beam.smp                       [.]
> erts_schedule
>    - 16.01% erts_schedule
>       - 13.62% erts_port_task_execute
>          - 13.11% packet_inet_input.isra.31
>             - 11.37% driver_realloc_binary
>                - 11.33% realloc_thr_pref
>                   - 10.50% __memcpy_avx_unaligned
>                        5.06% __memcpy_avx_unaligned
>                      + 1.04% page_fault
>                     0.66% do_erts_alcu_realloc.constprop.31
>             + 0.79% 0x108f3
>               0.55% driver_deliver_term
>         1.30% sched_spin_wait
>
> Seems like kernel version may change a lot, will run more tests.
>
> But it seems like memory operations are unaligned which could be not very
> efficient.
>

I'm not able to re-produce your benchmark, for some reason I don't get the
load that you get.

Anyways, I stared a bit at the code and you get a lot of realloc that move,
which is not good at all. Something that caught my eye was that the recbuf
is 2 MB, while the packets you receive are a lot smaller. One of the
side-effects of setting a large recbuf is that the user space buffer is
also increased to the same value. I don't think you want this to happen in
your case. What happens if you set buffer to the MTU?

Why would changing the user buffer size effect performance? Well, the udp
read is done into the user-space buffer of the given size. When that size
is 2 MB it is placed by erts_alloc inside a SBC (single block carrier).
Then later when is it known how much data was actually received, a realloc
is made of the 2 MB buffer to the size of the received data. This moves the
data across the SBC border to the MBS (multi block carrier) and the realloc
will have to copy the data in the binary. So by lowering the user-space
buffer to be small enough to be placed in the MBC from the start, the move
in realloc should disappear. That is if my theory is correct.

Lukas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180525/d440036c/attachment.htm>


More information about the erlang-questions mailing list