[erlang-questions] CPU load of TCP server

Kevin A. Smith <>
Thu Oct 15 01:32:45 CEST 2009


I wonder if using prim_inet:async_accept/2 would help reduce CPU usage.

--Kevin
On Oct 14, 2009, at 1:39 AM, Valentin Micic wrote:

> Hi Andrey,
>
> From your code snippet I suspect that you've opened your socket with
> {active, true} flag, which may explain excessive CPU usage -- it is  
> much
> cheaper to keep excess messages in a TCP buffer, rather than as  
> bunch of
> messages waiting on a process queue (for example, messages in the  
> TCP buffer
> have no effect on selective receive).
>
> Thus, changing your code to:
>
> loop(Socket) ->
>       receive
>              {tcp, Socket, _Packet} ->
>                       inet:setopts( Socket, [{active, once}] ),
>                       loop(Socket);
>               {tcp_closed, Socket} ->
>                       normal;
>               _ ->
>                       loop(Socket)
>       after 500 ->
>                       gen_tcp:send(Socket,[?PACKET]),
>                       loop(Socket)
>       end.
>
> Should save some CPU cycles... don't forget to open this socket with  
> initial
> {active, once}.
>
> As for performance difference between R12 and R13 -- not sure, but I  
> think
> there is more tax to be paid for pushing bunch of messages around to a
> scheduler with a right queue.
>
> V/
>
>
>
> -----Original Message-----
> From:  [mailto:erlang- 
> ] On
> Behalf Of Andrey Tsirulev
> Sent: 13 October 2009 08:34 PM
> To: Rapsey; 
> Subject: Re: [erlang-questions] CPU load of TCP server
>
> Hi Sergej,
>
> Thanks for the hint. I moved timer to client. Now I have about 5.5%  
> of CPU
> usage per each 1000 connections. I still expect it should be less..
>
> Andrey
>
>
> ----- Original Message -----
> From: "Rapsey" <>
> To: <>
> Sent: Tuesday, October 13, 2009 9:50 PM
> Subject: Re: [erlang-questions] CPU load of TCP server
>
>
>> Every time after gets executed a timer gets created (I presume).  
>> With 10k
>> processes it probably makes a noticeable CPU impact.
>>
>>
>> Sergej
>>
>> 2009/10/13 Andrey Tsirulev <>
>>
>>> Hello all,
>>>
>>> I'm exploring the possibility of using Erlang for my TCP service
>>> application (actually the game server). I've prepared test server  
>>> and
>>> client
>>> applications. The test server application accepts client  
>>> connections and
>>> sends 2 small (<1 Kb) packets per second to each client (and  
>>> receives
>>> answers).
>>>
>>> I've met the following problems:
>>> 1) Kernel polling doesn't give any benefit with R13B02-1.
>>> 2) CPU load is too high.
>>>
>>> All the details are below.
>>>
>>> Here's my test server's `uname -a`:
>>> Linux source 2.6.29-gentoo-r5 #1 SMP Tue Aug 18 01:15:17 MSD 2009  
>>> x86_64
>>> AMD Sempron(tm) Dual Core Processor 2200 AuthenticAMD GNU/Linux
>>> (I've made tests also with 2 other linux servers with different  
>>> kernel
>>> versions and results were close).
>>>
>>> I've made server connection processes as simple and possible. I've  
>>> tried
>>> up
>>> to 10000 concurrent connections.
>>>
>>> Test results didn't not show any visible difference between using
>>> multiple
>>> remote machines for client connections, one remote machine or  
>>> localhost.
>>>
>>> I tried R13B02-1 and R12B-5 OTP versions.
>>>
>>> I found that memory usage grow is linear, as expected. But I came  
>>> to the
>>> problem with CPU load.
>>>
>>> First of all, kernel polling didn't give any benefit for R13B02-1  
>>> (while
>>> erlang:system_info(kernel_poll) returned true and erl started with
>>> message
>>> [kernel-poll:true]). I've got about 55% of CPU usage with 4000
>>> connections
>>> both with and without kernel polling enabled, while with R12B-5 I  
>>> have
>>> about
>>> 26% of CPU usage with +Ktrue. I suspect a bug either in OTP or in  
>>> gentoo
>>> ebuild (of course it's also quite possible that I'm doing  
>>> something wrong
>
>>> or
>>> missed something in docs).
>>>
>>> The following is about R12B-5. I get about 6-7% of CPU load per  
>>> every
>>> 1000
>>> connections (about 60% CPU load for 10000 connections). I'm not  
>>> sure if I
>>> should consider this as a good result or a bad one. Most of the  
>>> articles
>>> on
>>> the same subject say that CPU load is negligible in their tests  
>>> and they
>>> are
>>> fighting for memory only, so I expected I won't be CPU-limited  
>>> too, but
>>> evidently I am.
>>>
>>> `top` says that about 50% of CPU load is userspace, 25% software
>>> interrupts, 20% system and 5% hardware interrupts (that's by eye,  
>>> not
>>> very
>>> strict).
>>>
>>> I found that CPU load depends not as much on connection count but on
>>> transmitted packet count (ok, that's obviously the number of system
>>> calls).
>>> Thus if I send 4 packets per second, not 2, I should decrease the  
>>> number
>>> of
>>> connections twice to preserve the same CPU load.
>>>
>>> CPU load does not depend on packet size. 1 byte or 1Kbyte - no  
>>> visible
>>> difference.
>>>
>>> CPU usage is slightly less with active socket option enabled than  
>>> with
>>> blocking recvs.
>>>
>>> CPU usage on the single windows client machine with 4000 connections
>>> spawned is on the same level as with the linux server handling  
>>> these 4000
>>> connections (while I expected linux to perform better).
>>>
>>> Switching Nagle on and off had no effect. I also tried to tune TCP  
>>> stack
>>> with sysctl using advises found here and there but almost without  
>>> any
>>> effect
>>> too.
>>>
>>> I've tried to trace with fprof and found that bottlenecks are 'send'
>>> operations (but I'm a relative novice to erlang so I'm not sure my  
>>> usage
>>> of
>>> fprof was correct). Ok, that was expected too. I've read the 'why is
>>> gen_tcp:send slow?' thread but none of advises given there helped  
>>> me.
>>>
>>> So the main question is: is the CPU usage of 7% per 1000  
>>> connections (or
>>> maybe better say 2000 packets per second) a good result? If no,  
>>> what is
>>> the
>>> expected result? How can I improve my test application? Or maybe
>>> something
>>> in my story looks strange?
>>>
>>> I know that the possible optimization is decreasing the number of  
>>> packets
>>> and keep it in mind.
>>>
>>> Here's the server connection process loop:
>>>
>>> loop(Socket) ->
>>>       receive
>>>               {tcp, Socket, _Packet} ->
>>>                       loop(Socket);
>>>               {tcp_closed, Socket} ->
>>>                       normal;
>>>               _ ->
>>>                       loop(Socket)
>>>       after 500 ->
>>>                       gen_tcp:send(Socket,[?PACKET]),
>>>                       loop(Socket)
>>>       end.
>>>
>>> Client loop has blocking recv and answers with send immediately.
>>>
>>> Thank you very much for your time. Sorry for too many words, I  
>>> tried to
>>> provide all possible information. I will answer any question and
>>> appreciate
>>> any hint.
>>>
>>> Best regards,
>>> Andrey
>>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>



More information about the erlang-questions mailing list