[erlang-questions] CPU load of TCP server

Thu Oct 15 15:38:21 CEST 2009

One thing I found was to play around with the settings for tcp recbuf and sendbuf (see setopts: http://www.erlang.org/doc/apps/kernel/index.html)

I was experiencing high load with a http client implementation, and managed to cut it by a considerable amount by modifying the above values (espc. recbuf).

It's worth a shot, but I might've been running into http client issues as opposed to basic tcp/gen_tcp problems.

-----Original Message-----
From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On Behalf Of Andrey Tsirulev
Sent: Tuesday, October 13, 2009 2:34 PM
To: Rapsey; erlang-questions@REDACTED
Subject: Re: [erlang-questions] CPU load of TCP server

Hi Sergej,

Thanks for the hint. I moved timer to client. Now I have about 5.5% of CPU 
usage per each 1000 connections. I still expect it should be less..

Andrey

----- Original Message ----- 
From: "Rapsey" <rapsey@REDACTED>
To: <erlang-questions@REDACTED>
Sent: Tuesday, October 13, 2009 9:50 PM
Subject: Re: [erlang-questions] CPU load of TCP server

> Every time after gets executed a timer gets created (I presume). With 10k
> processes it probably makes a noticeable CPU impact.
>
>
> Sergej
>
> 2009/10/13 Andrey Tsirulev <andrey@REDACTED>
>
>> Hello all,
>>
>> I'm exploring the possibility of using Erlang for my TCP service
>> application (actually the game server). I've prepared test server and 
>> client
>> applications. The test server application accepts client connections and
>> sends 2 small (<1 Kb) packets per second to each client (and receives
>> answers).
>>
>> I've met the following problems:
>> 1) Kernel polling doesn't give any benefit with R13B02-1.
>> 2) CPU load is too high.
>>
>> All the details are below.
>>
>> Here's my test server's `uname -a`:
>> Linux source 2.6.29-gentoo-r5 #1 SMP Tue Aug 18 01:15:17 MSD 2009 x86_64
>> AMD Sempron(tm) Dual Core Processor 2200 AuthenticAMD GNU/Linux
>> (I've made tests also with 2 other linux servers with different kernel
>> versions and results were close).
>>
>> I've made server connection processes as simple and possible. I've tried 
>> up
>> to 10000 concurrent connections.
>>
>> Test results didn't not show any visible difference between using 
>> multiple
>> remote machines for client connections, one remote machine or localhost.
>>
>> I tried R13B02-1 and R12B-5 OTP versions.
>>
>> I found that memory usage grow is linear, as expected. But I came to the
>> problem with CPU load.
>>
>> First of all, kernel polling didn't give any benefit for R13B02-1 (while
>> erlang:system_info(kernel_poll) returned true and erl started with 
>> message
>> [kernel-poll:true]). I've got about 55% of CPU usage with 4000 
>> connections
>> both with and without kernel polling enabled, while with R12B-5 I have 
>> about
>> 26% of CPU usage with +Ktrue. I suspect a bug either in OTP or in gentoo
>> ebuild (of course it's also quite possible that I'm doing something wrong 
>> or
>> missed something in docs).
>>
>> The following is about R12B-5. I get about 6-7% of CPU load per every 
>> 1000
>> connections (about 60% CPU load for 10000 connections). I'm not sure if I
>> should consider this as a good result or a bad one. Most of the articles 
>> on
>> the same subject say that CPU load is negligible in their tests and they 
>> are
>> fighting for memory only, so I expected I won't be CPU-limited too, but
>> evidently I am.
>>
>> `top` says that about 50% of CPU load is userspace, 25% software
>> interrupts, 20% system and 5% hardware interrupts (that's by eye, not 
>> very
>> strict).
>>
>> I found that CPU load depends not as much on connection count but on
>> transmitted packet count (ok, that's obviously the number of system 
>> calls).
>> Thus if I send 4 packets per second, not 2, I should decrease the number 
>> of
>> connections twice to preserve the same CPU load.
>>
>> CPU load does not depend on packet size. 1 byte or 1Kbyte - no visible
>> difference.
>>
>> CPU usage is slightly less with active socket option enabled than with
>> blocking recvs.
>>
>> CPU usage on the single windows client machine with 4000 connections
>> spawned is on the same level as with the linux server handling these 4000
>> connections (while I expected linux to perform better).
>>
>> Switching Nagle on and off had no effect. I also tried to tune TCP stack
>> with sysctl using advises found here and there but almost without any 
>> effect
>> too.
>>
>> I've tried to trace with fprof and found that bottlenecks are 'send'
>> operations (but I'm a relative novice to erlang so I'm not sure my usage 
>> of
>> fprof was correct). Ok, that was expected too. I've read the 'why is
>> gen_tcp:send slow?' thread but none of advises given there helped me.
>>
>> So the main question is: is the CPU usage of 7% per 1000 connections (or
>> maybe better say 2000 packets per second) a good result? If no, what is 
>> the
>> expected result? How can I improve my test application? Or maybe 
>> something
>> in my story looks strange?
>>
>> I know that the possible optimization is decreasing the number of packets
>> and keep it in mind.
>>
>> Here's the server connection process loop:
>>
>> loop(Socket) ->
>>        receive
>>                {tcp, Socket, _Packet} ->
>>                        loop(Socket);
>>                {tcp_closed, Socket} ->
>>                        normal;
>>                _ ->
>>                        loop(Socket)
>>        after 500 ->
>>                        gen_tcp:send(Socket,[?PACKET]),
>>                        loop(Socket)
>>        end.
>>
>> Client loop has blocking recv and answers with send immediately.
>>
>> Thank you very much for your time. Sorry for too many words, I tried to
>> provide all possible information. I will answer any question and 
>> appreciate
>> any hint.
>>
>> Best regards,
>> Andrey
> 

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org