[erlang-questions] why is gen_tcp:send slow?

Wed Jun 25 00:55:44 CEST 2008

Johnny,

Thanks for the lesson! I am always happy to learn. Like I said, I am not an
expert in TCP/IP.

What I was writing about when I said that packets are acknowledged is what I
saw in Wireshark while trying to understand performance issues. I perhaps
should have said "TCP/IP" instead of just "TCP". There were definitely
acknowledgements, but I guess they were at the IP level.

I wonder what the MSS is for loopback? I think it's about 1536 on my eth0
interface, but not sure.

As for RTT, I sent data over a link that had a very long (290ms) RTT, and
that definitely limited the rate at which packets could be sent. Can RTT be
used to calculate  the theoretical maximum traffic that a link can carry?
For example, a satellite link with a 400ms RTT but 2 Mbps bandwidth?

Ed

On Tue, Jun 24, 2008 at 6:00 PM, Johnny Billquist <bqt@REDACTED> wrote:

> No. TCP don't acknowledge every packet. In fact, TCP don't acknowledge
> packets as such at all. TCP is not packet based. It's just that if you use
> IP as the carrier, IP itself it packet based.
> TCP can in theory generate any number of packets per second. However, the
> amount of unacknowledged data that can be outstanding at any time is limited
> by the transmit window. Each packet carries a window size, which is how much
> more data that can be accepted by the reciever. TCP can (is allowed to) send
> that much data and no more.
>
> The RTT calculations are used for figuring out how long to wait before
> doing retransmissions. You also normally have a slow start transmission
> algorithm which prevents the sender from even using the full window size
> from the start, as a way of avoiding congestions. That is used in
> combination with a backoff algorithm when retransmissions are needed to
> further decrease congestions, but all of this only really comes into effect
> if you start loosing data, and TCP actually needs to do retransmissions.
>
> Another thing you have is an algorithm called Nagle, which tries to collect
> small amount of data sent into larger packets before sending it, so that you
> don't flood the net with silly small packets.
>
> One addisional detail is that receivers normally, when the receive buffers
> becomes full, don't announce newly freed space immediately, since that is
> normally rather small amounts, but instead wait a while, until a larger part
> of the receive buffer is free, so that the sender actually can send some
> full sized packets once it starts sending again.
>
> In addition to all this, you also have a max segment size which is
> negotiated between the TCP ends, which limit the size of a single IP packet
> sent by the TCP protocol. This is done in order to try to avoid packet
> fragmentation.
>
> So the window size is actually a flow control mechanism, and is in reality
> limiting the amount of data that can be sent. And it varies all the time.
> And the number of packets that will be used for sending that much data is
> determined by the MSS (Max Segment Size).
>
> Sorry for the long text on how TCP works. :-)
>
>        Johnny
>
> Edwin Fine wrote:
>
>> David,
>>
>> Thanks for trying out the benchmark.
>>
>> With my limited knowledge of TCP/IP, I believe you are seeing the 300,000
>> limit because TCP/IP requires acknowledgements to each packet, and although
>> it can batch up multiple acknowledgements in one packet, there is a
>> theoretical limit of packets per seconds beyond which it cannot go due to
>> the laws of physics. I understand that limit is determined by the Round-Trip
>> Time (RTT), which can be shown by ping. On my system, pinging 127.0.0.1 <
>> http://127.0.0.1> gives a minimum RTT of 0.018 ms (out of 16 pings). That
>> means that the maximum number of packets that can make it to and dest and
>> back per second is 1/0.000018 seconds, or 55555 packets per second. The
>> TCP/IP stack is evidently packing 5 or 6 blocks into each packet to get the
>> 300K blocks/sec you are seeing. Using Wireshark or Ethereal would confirm
>> this. I am guessing that this means that the TCP window is about 6 * 1000
>> bytes or 6KB.
>>
>> What I neglected to tell this group is that I have modified the Linux
>> sysctl.conf as follows, which might have had an effect (like I said, I am
>> not an expert):
>>
>> # increase Linux autotuning TCP buffer limits
>> # min, default, and max number of bytes to use
>> # set max to at least 4MB, or higher if you use very high BDP paths
>> net.ipv4.tcp_rmem = 4096 87380 16777216
>> net.ipv4.tcp_wmem = 4096 32768 16777216
>>
>> When I have more time, I will vary a number of different Erlang TCP/IP
>> parameters and get a data set together that gives a broader picture of the
>> effect of the parameters.
>>
>> Thanks again for taking the time.
>>
>> 2008/6/24 David Mercer <dmercer@REDACTED <mailto:dmercer@REDACTED>>:
>>
>>    I tried some alternative block sizes (using the blksize option).  I
>>    found that from 1 to somewhere around––maybe a bit short of––1000
>>    bytes, the test was able to send about 300,000 blocks in 10 seconds
>>    regardless of size.  (That means, 0.03 MB/sec for block size of 1,
>>    0.3 MB/sec for block size of 10, 3 MB/sec  for block size of 100,
>>    etc.)  I suspect the system was CPU bound at those levels.
>>
>>
>>    Above 1000, the number of blocks sent seemed to decrease, though
>>    this was more than offset by the increased size of the blocks.    Above
>> about 10,000 byte blocks (may have been less, I didn't check
>>    any value between 4,000 and 10,000), however, performance peaked and
>>    block size no longer mattered: it always sent between 70 and 80
>>    MB/sec.  My machine is clearly slower than Edwin's…
>>
>>
>>    DBM
>>
>>
>>
>>  ------------------------------------------------------------------------
>>
>>    *From:* erlang-questions-bounces@REDACTED
>>    <mailto:erlang-questions-bounces@REDACTED>
>>    [mailto:erlang-questions-bounces@REDACTED
>>    <mailto:erlang-questions-bounces@REDACTED>] *On Behalf Of *Rapsey
>>    *Sent:* Tuesday, June 24, 2008 14:01
>>    *To:* erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>    *Subject:* Re: [erlang-questions] why is gen_tcp:send slow?
>>
>>
>>    You're using very large packets. I think the results would be much
>>    more telling if the packets would be a few kB at most. That is
>>    closer to most real life situations.
>>
>>
>>    Sergej
>>
>>    On Tue, Jun 24, 2008 at 8:43 PM, Edwin Fine
>>    <erlang-questions_efine@REDACTED
>>    <mailto:erlang-questions_efine@REDACTED>> wrote:
>>
>>    I wrote a small benchmark in Erlang to see how fast I could get
>>    socket communications to go. All the benchmark does is pump the same
>>    buffer to a socket for (by default) 10 seconds. It uses {active,
>>    once} each time, just like you do.
>>
>>    Server TCP options:
>>         {active, once},
>>            {reuseaddr, true},
>>            {packet, 0},
>>            {packet_size, 65536},
>>            {recbuf, 1000000}
>>
>>    Client TCP options:
>>            {packet, raw},
>>            {packet_size, 65536},
>>            {sndbuf, 1024 * 1024},
>>            {send_timeout, 3000}
>>
>>    Here are some results using Erlang R12B-3 (erl +K true in the Linux
>>    version):
>>
>>    Linux (Ubuntu 8.10 x86_64, Intel Core 2 Q6600, 8 GB):
>>    - Using localhost (127.0.0.1 <http://127.0.0.1>): 7474.14 MB in
>>
>>    10.01 secs (746.66 MB/sec)
>>    - Using 192.168.x.x IP address: 8064.94 MB in 10.00 secs (806.22
>>    MB/sec) [Don't ask me why it's faster than using loopback, I
>>    repeated the tests and got the same result]
>>
>>    Windows XP SP3 (32 bits), Intel Core 2 Duo E6600:
>>    - Using loopback: 2166.97 MB in 10.02 secs (216.35 MB/sec)
>>    - Using 192.168.x.x IP address: 2140.72 MB in 10.02 secs (213.75
>> MB/sec)
>>    - On Gigabit Ethernet to the Q6600 Linux box: 1063.61 MB in 10.02
>>    secs (106.17 MB/sec) using non-jumbo frames. I don't think my router
>>    supports jumbo frames.
>>
>>    There's undoubtedly a huge discrepancy between the two systems,
>>    whether because of kernel poll in Linux, or that it's 64 bits, or
>>    unoptimized Windows TCP/IP flags, I don't know. I don't believe it's
>>    the number of CPUs (there's only 1 process sending and one
>>    receiving), or the CPU speed (they are both 2.4 GHz Core 2s).
>>
>>    Maybe some Erlang TCP/IP gurus could comment.
>>
>>    I've attached the code for interest. It's not supposed to be
>>    production quality, so please don't beat me up :) although I am
>>    always open to suggestions for improvement. If you do improve it,
>>    I'd like to see what you've done. Maybe there is another simple
>>    Erlang tcp benchmark program out there (i.e. not Tsung), but I
>>    couldn't find one in a cursory Google search.
>>
>>    To run:
>>
>>    VM1:
>>
>>    tb_server:start(Port, Opts).
>>    tb_server:stop() to stop.
>>
>>    Port = integer()
>>    Opts = []|[opt()]
>>    opt() = {atom(), term()} (Accepts inet setopts options, too)
>>
>>    The server prints out the transfer rate (for simplicity).
>>
>>    VM2:
>>    tb_client(Host, Port, Opts).
>>
>>    Host = atom()|string() hostname or IP address
>>    Port, Opts as in tb_server
>>
>>    Runs for 10 seconds, sending a 64K buffer as fast as possible to
>>    Host/Port.
>>    You can change this to 20 seconds (e.g.) by adding the tupls
>>    {time_limit, 20000} to Opts.
>>    You can change buffer size by adding the tuple {blksize, Bytes} to
>> Opts.
>>
>>    2008/6/20 Rapsey <rapsey@REDACTED <mailto:rapsey@REDACTED>>:
>>
>>    All data goes through nginx which acts as a proxy. Its CPU
>>    consumption is never over 1%.
>>
>>
>>    Sergej
>>
>>
>>    On Thu, Jun 19, 2008 at 9:35 PM, Javier París Fernández
>>    <javierparis@REDACTED <mailto:javierparis@REDACTED>> wrote:
>>
>>
>>    El 19/06/2008, a las 20:06, Rapsey escribió:
>>
>>
>>        It loops from another module, that way I can update the code at
>>        any time without disrupting anything.
>>        The packets are generally a few hundred bytes big, except
>>        keyframes which tend to be in the kB range. I haven't tried
>>        looking with wireshark.  Still it seems a bit odd that a large
>>        CPU consumption would be the symptom. The traffic is strictly
>>        one way. Either someone is sending the stream or receiving it.
>>        The transmit could of course be written with a passive receive,
>>        but the code would be significantly uglier. I'm sure someone
>>        here knows if setting {active, once} every packet is CPU
>>        intensive or not.
>>        It seems the workings of gen_tcp is quite platform dependent. If
>>        I run the code in windows, sending more than 128 bytes per
>>        gen_tcp call significantly decreases network output.
>>        Oh and I forgot to mention I use R12B-3.
>>
>>
>>    Hi,
>>
>>    Without being an expert.
>>
>>    200-300 mb/s  in small (hundreds of bytes) packets means a *lot* of
>>    system calls if you are doing a gen_tcp:send for each one. If you
>>    buffer 3 packets, you are reducing that by a factor of 3 :). I'd try
>>    to do an small test doing the same thing in C and compare the
>>    results. I think it will also eat a lot of CPU.
>>
>>    About the proxy CPU... I'm a bit lost about it, but speculating
>>    wildly it is possible that the time spent doing the system calls
>>    that gen_tcp is doing is added to the proxy CPU process.
>>
>>    Regards.
>>
>>
>>
>>    _______________________________________________
>>    erlang-questions mailing list
>>    erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>    http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>>
>>    _______________________________________________
>>    erlang-questions mailing list
>>    erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>>    http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080624/d5745b33/attachment.htm>