[erlang-questions] why is gen_tcp:send slow?

Edwin Fine erlang-questions_efine@REDACTED
Tue Jun 24 20:43:05 CEST 2008


I wrote a small benchmark in Erlang to see how fast I could get socket
communications to go. All the benchmark does is pump the same buffer to a
socket for (by default) 10 seconds. It uses {active, once} each time, just
like you do.

Server TCP options:
     {active, once},
        {reuseaddr, true},
        {packet, 0},
        {packet_size, 65536},
        {recbuf, 1000000}

Client TCP options:
        {packet, raw},
        {packet_size, 65536},
        {sndbuf, 1024 * 1024},
        {send_timeout, 3000}

Here are some results using Erlang R12B-3 (erl +K true in the Linux
version):

Linux (Ubuntu 8.10 x86_64, Intel Core 2 Q6600, 8 GB):
- Using localhost (127.0.0.1): 7474.14 MB in 10.01 secs (746.66 MB/sec)
- Using 192.168.x.x IP address: 8064.94 MB in 10.00 secs (806.22 MB/sec)
[Don't ask me why it's faster than using loopback, I repeated the tests and
got the same result]

Windows XP SP3 (32 bits), Intel Core 2 Duo E6600:
- Using loopback: 2166.97 MB in 10.02 secs (216.35 MB/sec)
- Using 192.168.x.x IP address: 2140.72 MB in 10.02 secs (213.75 MB/sec)
- On Gigabit Ethernet to the Q6600 Linux box: 1063.61 MB in 10.02 secs
(106.17 MB/sec) using non-jumbo frames. I don't think my router supports
jumbo frames.

There's undoubtedly a huge discrepancy between the two systems, whether
because of kernel poll in Linux, or that it's 64 bits, or unoptimized
Windows TCP/IP flags, I don't know. I don't believe it's the number of CPUs
(there's only 1 process sending and one receiving), or the CPU speed (they
are both 2.4 GHz Core 2s).

Maybe some Erlang TCP/IP gurus could comment.

I've attached the code for interest. It's not supposed to be production
quality, so please don't beat me up :) although I am always open to
suggestions for improvement. If you do improve it, I'd like to see what
you've done. Maybe there is another simple Erlang tcp benchmark program out
there (i.e. not Tsung), but I couldn't find one in a cursory Google search.

To run:

VM1:

tb_server:start(Port, Opts).
tb_server:stop() to stop.

Port = integer()
Opts = []|[opt()]
opt() = {atom(), term()} (Accepts inet setopts options, too)

The server prints out the transfer rate (for simplicity).

VM2:
tb_client(Host, Port, Opts).

Host = atom()|string() hostname or IP address
Port, Opts as in tb_server

Runs for 10 seconds, sending a 64K buffer as fast as possible to Host/Port.
You can change this to 20 seconds (e.g.) by adding the tupls {time_limit,
20000} to Opts.
You can change buffer size by adding the tuple {blksize, Bytes} to Opts.

2008/6/20 Rapsey <rapsey@REDACTED>:

> All data goes through nginx which acts as a proxy. Its CPU consumption is
> never over 1%.
>
>
> Sergej
>
>
> On Thu, Jun 19, 2008 at 9:35 PM, Javier París Fernández <
> javierparis@REDACTED> wrote:
>
>>
>> El 19/06/2008, a las 20:06, Rapsey escribió:
>>
>>  It loops from another module, that way I can update the code at any time
>>> without disrupting anything.
>>> The packets are generally a few hundred bytes big, except keyframes which
>>> tend to be in the kB range. I haven't tried looking with wireshark.  Still
>>> it seems a bit odd that a large CPU consumption would be the symptom. The
>>> traffic is strictly one way. Either someone is sending the stream or
>>> receiving it.
>>> The transmit could of course be written with a passive receive, but the
>>> code would be significantly uglier. I'm sure someone here knows if setting
>>> {active, once} every packet is CPU intensive or not.
>>> It seems the workings of gen_tcp is quite platform dependent. If I run
>>> the code in windows, sending more than 128 bytes per gen_tcp call
>>> significantly decreases network output.
>>> Oh and I forgot to mention I use R12B-3.
>>>
>>
>> Hi,
>>
>> Without being an expert.
>>
>> 200-300 mb/s  in small (hundreds of bytes) packets means a *lot* of system
>> calls if you are doing a gen_tcp:send for each one. If you buffer 3 packets,
>> you are reducing that by a factor of 3 :). I'd try to do an small test doing
>> the same thing in C and compare the results. I think it will also eat a lot
>> of CPU.
>>
>> About the proxy CPU... I'm a bit lost about it, but speculating wildly it
>> is possible that the time spent doing the system calls that gen_tcp is doing
>> is added to the proxy CPU process.
>>
>> Regards.
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080624/13e25e29/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tcp_bench.tgz
Type: application/x-gzip
Size: 4553 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080624/13e25e29/attachment.bin>


More information about the erlang-questions mailing list