[erlang-questions] Heavy duty UDP server performance

Ameretat Reith ameretat.reith@REDACTED
Mon Feb 1 20:40:49 CET 2016

I'm playing with Erlang to make an experimental protocol.  I'm trying
to make it use full of 1Gbit link but It won't scale that much and I'm
failing to found a bottleneck in my code or even anything I could call
it bottleneck.

My software is very like a messaging server software in behavior, with
bigger packets, many clients (more than 4k) and uses more complex
sub-components, like a distributed database but those components are
not blocking other portions of system;  It's just the client-server
channel that is heavy IO and involve some encryption and decryption.

I made a gen_server process for each UDP socket to clients.  There is a
central process registry but It being called just for new clients and
Its message queue is often empty.

I found there was a bottleneck in `scheduler_wait` when I had few
clients (around 400) and It consumed around 50% of total CPU usage.  I
found an old patch by Wei Cao [1] which seemed to target same issue.
But on a modern version of Erlang (18.0) blockage in `scheduler_wait`
dropped well in more congested network, specifically to around 10%
when my software reached Its apparent limit, around 600Mbit/s read and
write to network. At this point my incoming UDP packet rate is around
24K/s. Maybe an experienced Erlang developer here can remember that
problem and can tell whether Erlang is now optimized to poll for
network packets more often or not..

I also concerned async pool since there was fairly high work in Erlang
work with pthread but found those threads just used for file IO
operations.  I didn't found any assuring documentation about this, just
saw the only user of this dirty IO thing is `io.c` in otp source code.
I'm very grateful if anyone clear the usage and effect of this pool.

I made flame graphs of function calls both inside VM (using eflame2
[2]) which is very even and cannot find any outstanding usage [3]. And
made another flamegraph of perf report outside of VM which cannot find
some symbols [4].  I doubt whether process_main shoud take that much
work itself or not.  Apparently encryption and decryption (enacl_nif
calls) didn't take much time too.

Do you have any suggestion for me to analyze better my software and
understand VM working?  Is It those limits I should expect and there is
not more room for optimizations?

Thanks in advance

1: http://erlang.org/pipermail/erlang-questions/2012-July/067868.html
2: https://github.com/slfritchie/eflame
3: http://file.reith.ir/in-erl-3k.gif
4: http://file.reith.ir/out-erl-perf.svg (interactive, use web browser)

More information about the erlang-questions mailing list