[erlang-questions] why is gen_tcp:send slow?

Rapsey <>
Tue Jun 24 21:00:42 CEST 2008


You're using very large packets. I think the results would be much more
telling if the packets would be a few kB at most. That is closer to most
real life situations.


Sergej

On Tue, Jun 24, 2008 at 8:43 PM, Edwin Fine <>
wrote:

> I wrote a small benchmark in Erlang to see how fast I could get socket
> communications to go. All the benchmark does is pump the same buffer to a
> socket for (by default) 10 seconds. It uses {active, once} each time, just
> like you do.
>
> Server TCP options:
>      {active, once},
>         {reuseaddr, true},
>         {packet, 0},
>         {packet_size, 65536},
>         {recbuf, 1000000}
>
> Client TCP options:
>         {packet, raw},
>         {packet_size, 65536},
>         {sndbuf, 1024 * 1024},
>         {send_timeout, 3000}
>
> Here are some results using Erlang R12B-3 (erl +K true in the Linux
> version):
>
> Linux (Ubuntu 8.10 x86_64, Intel Core 2 Q6600, 8 GB):
> - Using localhost (127.0.0.1): 7474.14 MB in 10.01 secs (746.66 MB/sec)
> - Using 192.168.x.x IP address: 8064.94 MB in 10.00 secs (806.22 MB/sec)
> [Don't ask me why it's faster than using loopback, I repeated the tests and
> got the same result]
>
> Windows XP SP3 (32 bits), Intel Core 2 Duo E6600:
> - Using loopback: 2166.97 MB in 10.02 secs (216.35 MB/sec)
> - Using 192.168.x.x IP address: 2140.72 MB in 10.02 secs (213.75 MB/sec)
> - On Gigabit Ethernet to the Q6600 Linux box: 1063.61 MB in 10.02 secs
> (106.17 MB/sec) using non-jumbo frames. I don't think my router supports
> jumbo frames.
>
> There's undoubtedly a huge discrepancy between the two systems, whether
> because of kernel poll in Linux, or that it's 64 bits, or unoptimized
> Windows TCP/IP flags, I don't know. I don't believe it's the number of CPUs
> (there's only 1 process sending and one receiving), or the CPU speed (they
> are both 2.4 GHz Core 2s).
>
> Maybe some Erlang TCP/IP gurus could comment.
>
> I've attached the code for interest. It's not supposed to be production
> quality, so please don't beat me up :) although I am always open to
> suggestions for improvement. If you do improve it, I'd like to see what
> you've done. Maybe there is another simple Erlang tcp benchmark program out
> there (i.e. not Tsung), but I couldn't find one in a cursory Google search.
>
> To run:
>
> VM1:
>
> tb_server:start(Port, Opts).
> tb_server:stop() to stop.
>
> Port = integer()
> Opts = []|[opt()]
> opt() = {atom(), term()} (Accepts inet setopts options, too)
>
> The server prints out the transfer rate (for simplicity).
>
> VM2:
> tb_client(Host, Port, Opts).
>
> Host = atom()|string() hostname or IP address
> Port, Opts as in tb_server
>
> Runs for 10 seconds, sending a 64K buffer as fast as possible to Host/Port.
> You can change this to 20 seconds (e.g.) by adding the tupls {time_limit,
> 20000} to Opts.
> You can change buffer size by adding the tuple {blksize, Bytes} to Opts.
>
> 2008/6/20 Rapsey <>:
>
>> All data goes through nginx which acts as a proxy. Its CPU consumption is
>> never over 1%.
>>
>>
>> Sergej
>>
>>
>> On Thu, Jun 19, 2008 at 9:35 PM, Javier París Fernández <
>> > wrote:
>>
>>>
>>> El 19/06/2008, a las 20:06, Rapsey escribió:
>>>
>>>  It loops from another module, that way I can update the code at any time
>>>> without disrupting anything.
>>>> The packets are generally a few hundred bytes big, except keyframes
>>>> which tend to be in the kB range. I haven't tried looking with wireshark.
>>>>  Still it seems a bit odd that a large CPU consumption would be the symptom.
>>>> The traffic is strictly one way. Either someone is sending the stream or
>>>> receiving it.
>>>> The transmit could of course be written with a passive receive, but the
>>>> code would be significantly uglier. I'm sure someone here knows if setting
>>>> {active, once} every packet is CPU intensive or not.
>>>> It seems the workings of gen_tcp is quite platform dependent. If I run
>>>> the code in windows, sending more than 128 bytes per gen_tcp call
>>>> significantly decreases network output.
>>>> Oh and I forgot to mention I use R12B-3.
>>>>
>>>
>>> Hi,
>>>
>>> Without being an expert.
>>>
>>> 200-300 mb/s  in small (hundreds of bytes) packets means a *lot* of
>>> system calls if you are doing a gen_tcp:send for each one. If you buffer 3
>>> packets, you are reducing that by a factor of 3 :). I'd try to do an small
>>> test doing the same thing in C and compare the results. I think it will also
>>> eat a lot of CPU.
>>>
>>> About the proxy CPU... I'm a bit lost about it, but speculating wildly it
>>> is possible that the time spent doing the system calls that gen_tcp is doing
>>> is added to the proxy CPU process.
>>>
>>> Regards.
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> 
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080624/e51c5d09/attachment.html>


More information about the erlang-questions mailing list