[erlang-questions] CPU load of TCP server

Andrey Tsirulev andrey@REDACTED
Wed Oct 14 14:07:08 CEST 2009


Hi Valentin,

Right, but in this particular case it doesn't give any benefit as there's 
only one packet per 500ms for each process which is handled at once before 
the next packet comes, so tests showed no difference between these 2 cases 
(though I've got your point and will use {active, once} in production).

Andrey

----- Original Message ----- 
From: "Valentin Micic" <v@REDACTED>
To: "'Andrey Tsirulev'" <andrey@REDACTED>; "'Rapsey'" <rapsey@REDACTED>; 
<erlang-questions@REDACTED>
Sent: Wednesday, October 14, 2009 9:39 AM
Subject: RE: [erlang-questions] CPU load of TCP server


> Hi Andrey,
>
> From your code snippet I suspect that you've opened your socket with
> {active, true} flag, which may explain excessive CPU usage -- it is much
> cheaper to keep excess messages in a TCP buffer, rather than as bunch of
> messages waiting on a process queue (for example, messages in the TCP 
> buffer
> have no effect on selective receive).
>
> Thus, changing your code to:
>
> loop(Socket) ->
>       receive
>              {tcp, Socket, _Packet} ->
>                       inet:setopts( Socket, [{active, once}] ),
>                       loop(Socket);
>               {tcp_closed, Socket} ->
>                       normal;
>               _ ->
>                       loop(Socket)
>       after 500 ->
>                       gen_tcp:send(Socket,[?PACKET]),
>                       loop(Socket)
>       end.
>
> Should save some CPU cycles... don't forget to open this socket with 
> initial
> {active, once}.
>
> As for performance difference between R12 and R13 -- not sure, but I think
> there is more tax to be paid for pushing bunch of messages around to a
> scheduler with a right queue.
>
> V/
>
>
>
> -----Original Message-----
> From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On
> Behalf Of Andrey Tsirulev
> Sent: 13 October 2009 08:34 PM
> To: Rapsey; erlang-questions@REDACTED
> Subject: Re: [erlang-questions] CPU load of TCP server
>
> Hi Sergej,
>
> Thanks for the hint. I moved timer to client. Now I have about 5.5% of CPU
> usage per each 1000 connections. I still expect it should be less..
>
> Andrey
>
>
> ----- Original Message ----- 
> From: "Rapsey" <rapsey@REDACTED>
> To: <erlang-questions@REDACTED>
> Sent: Tuesday, October 13, 2009 9:50 PM
> Subject: Re: [erlang-questions] CPU load of TCP server
>
>
>> Every time after gets executed a timer gets created (I presume). With 10k
>> processes it probably makes a noticeable CPU impact.
>>
>>
>> Sergej
>>
>> 2009/10/13 Andrey Tsirulev <andrey@REDACTED>
>>
>>> Hello all,
>>>
>>> I'm exploring the possibility of using Erlang for my TCP service
>>> application (actually the game server). I've prepared test server and
>>> client
>>> applications. The test server application accepts client connections and
>>> sends 2 small (<1 Kb) packets per second to each client (and receives
>>> answers).
>>>
>>> I've met the following problems:
>>> 1) Kernel polling doesn't give any benefit with R13B02-1.
>>> 2) CPU load is too high.
>>>
>>> All the details are below.
>>>
>>> Here's my test server's `uname -a`:
>>> Linux source 2.6.29-gentoo-r5 #1 SMP Tue Aug 18 01:15:17 MSD 2009 x86_64
>>> AMD Sempron(tm) Dual Core Processor 2200 AuthenticAMD GNU/Linux
>>> (I've made tests also with 2 other linux servers with different kernel
>>> versions and results were close).
>>>
>>> I've made server connection processes as simple and possible. I've tried
>>> up
>>> to 10000 concurrent connections.
>>>
>>> Test results didn't not show any visible difference between using
>>> multiple
>>> remote machines for client connections, one remote machine or localhost.
>>>
>>> I tried R13B02-1 and R12B-5 OTP versions.
>>>
>>> I found that memory usage grow is linear, as expected. But I came to the
>>> problem with CPU load.
>>>
>>> First of all, kernel polling didn't give any benefit for R13B02-1 (while
>>> erlang:system_info(kernel_poll) returned true and erl started with
>>> message
>>> [kernel-poll:true]). I've got about 55% of CPU usage with 4000
>>> connections
>>> both with and without kernel polling enabled, while with R12B-5 I have
>>> about
>>> 26% of CPU usage with +Ktrue. I suspect a bug either in OTP or in gentoo
>>> ebuild (of course it's also quite possible that I'm doing something 
>>> wrong
>
>>> or
>>> missed something in docs).
>>>
>>> The following is about R12B-5. I get about 6-7% of CPU load per every
>>> 1000
>>> connections (about 60% CPU load for 10000 connections). I'm not sure if 
>>> I
>>> should consider this as a good result or a bad one. Most of the articles
>>> on
>>> the same subject say that CPU load is negligible in their tests and they
>>> are
>>> fighting for memory only, so I expected I won't be CPU-limited too, but
>>> evidently I am.
>>>
>>> `top` says that about 50% of CPU load is userspace, 25% software
>>> interrupts, 20% system and 5% hardware interrupts (that's by eye, not
>>> very
>>> strict).
>>>
>>> I found that CPU load depends not as much on connection count but on
>>> transmitted packet count (ok, that's obviously the number of system
>>> calls).
>>> Thus if I send 4 packets per second, not 2, I should decrease the number
>>> of
>>> connections twice to preserve the same CPU load.
>>>
>>> CPU load does not depend on packet size. 1 byte or 1Kbyte - no visible
>>> difference.
>>>
>>> CPU usage is slightly less with active socket option enabled than with
>>> blocking recvs.
>>>
>>> CPU usage on the single windows client machine with 4000 connections
>>> spawned is on the same level as with the linux server handling these 
>>> 4000
>>> connections (while I expected linux to perform better).
>>>
>>> Switching Nagle on and off had no effect. I also tried to tune TCP stack
>>> with sysctl using advises found here and there but almost without any
>>> effect
>>> too.
>>>
>>> I've tried to trace with fprof and found that bottlenecks are 'send'
>>> operations (but I'm a relative novice to erlang so I'm not sure my usage
>>> of
>>> fprof was correct). Ok, that was expected too. I've read the 'why is
>>> gen_tcp:send slow?' thread but none of advises given there helped me.
>>>
>>> So the main question is: is the CPU usage of 7% per 1000 connections (or
>>> maybe better say 2000 packets per second) a good result? If no, what is
>>> the
>>> expected result? How can I improve my test application? Or maybe
>>> something
>>> in my story looks strange?
>>>
>>> I know that the possible optimization is decreasing the number of 
>>> packets
>>> and keep it in mind.
>>>
>>> Here's the server connection process loop:
>>>
>>> loop(Socket) ->
>>>        receive
>>>                {tcp, Socket, _Packet} ->
>>>                        loop(Socket);
>>>                {tcp_closed, Socket} ->
>>>                        normal;
>>>                _ ->
>>>                        loop(Socket)
>>>        after 500 ->
>>>                        gen_tcp:send(Socket,[?PACKET]),
>>>                        loop(Socket)
>>>        end.
>>>
>>> Client loop has blocking recv and answers with send immediately.
>>>
>>> Thank you very much for your time. Sorry for too many words, I tried to
>>> provide all possible information. I will answer any question and
>>> appreciate
>>> any hint.
>>>
>>> Best regards,
>>> Andrey
>>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
> 



More information about the erlang-questions mailing list