[erlang-questions] 300k HTTP GET RPS on Erlang 21.2, benchmarking per scheduler polling

Vans S vans_163@REDACTED
Tue Dec 18 15:34:01 CET 2018


 I think OTHER is something to do with Ports / polling, because I just removed the inet_drv and wrote a simple c nif to do TCP networking, and the throughput doubled. I did not get around to recompiling erlang with microstate accounting but without inet driver using an unoptimized nonblocking tcp nif I got the msacc report to look like

 scheduler( 1)    0.68%    0.00%   89.85%    3.46%    6.01%    0.00%    0.00% scheduler( 2)    0.66%    0.01%   90.43%    3.40%    5.50%    0.00%    0.00%
Using 2 schedulers because 10 physical cores generating load now just barely fully saturated.  now 90% of the time is spent in emulator, 6% is other, I am guessing 6% other is the NIF calls to the socket calls?

The throughput was 250k for 2 physical cores.  If all scales linearly that is 1.25m RPS for simple GET hello world benchmark.

The NIF is PoC https://gist.github.com/vans163/d96fcc7c89d0cf25c819c5fb77769e81 ofcourse its only useful in the case there is constant data on socket, otherwise this PoC will break if there is idle connections that keep getting polled.  This opens the possibility though to using something like DPDK.
    On Tuesday, December 18, 2018, 2:32:21 a.m. EST, Lukas Larsson <lukas@REDACTED> wrote:  
 
 
Hello,
On Tue, Dec 18, 2018 at 2:36 AM Vans S <vans_163@REDACTED> wrote:

If anyone is interested here is the writeup  https://elixirforum.com/t/300k-requests-per-second-webserver-in-elixir-otp21-2-10-cores/18823.

tl;dr; About 22% of time the scheduler spent in poll. to serve 30k~ HTTP Get requests. I think its a little much still?

Firstly, it is not poll that you spend 22% in, it is PORT, i.e. the work done by gen_tcp to call writev/read. Polling shows up in the state CHECK_IO. The optimizations introduced in 21.2 were mainly done to reduce the time spent doing polling.
Secondly, I'd say it is too little. As you saw in the edit that you made, if your remove/optimize the Erlang parts you will get higher a throughput rate as the system can spend more time doing port work. What you are seeing as OTHER is most likely the system spinning looking for work to do. You can get more states if you are interested in digging deeper by passing --with-microstate-accounting=extra to configure.
The inet_driver (the port driver that is used for TCP/UDP/SCTP) is not perfect, but almost 2 decades have been spent improving it, so there are very few low hanging fruits left.
Lukas  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20181218/247c6f79/attachment.htm>


More information about the erlang-questions mailing list