[erlang-questions] why is gen_tcp:send slow?
Edwin Fine
erlang-questions_efine@REDACTED
Wed Jun 25 00:55:44 CEST 2008
Johnny,
Thanks for the lesson! I am always happy to learn. Like I said, I am not an
expert in TCP/IP.
What I was writing about when I said that packets are acknowledged is what I
saw in Wireshark while trying to understand performance issues. I perhaps
should have said "TCP/IP" instead of just "TCP". There were definitely
acknowledgements, but I guess they were at the IP level.
I wonder what the MSS is for loopback? I think it's about 1536 on my eth0
interface, but not sure.
As for RTT, I sent data over a link that had a very long (290ms) RTT, and
that definitely limited the rate at which packets could be sent. Can RTT be
used to calculate the theoretical maximum traffic that a link can carry?
For example, a satellite link with a 400ms RTT but 2 Mbps bandwidth?
Ed
On Tue, Jun 24, 2008 at 6:00 PM, Johnny Billquist <bqt@REDACTED> wrote:
> No. TCP don't acknowledge every packet. In fact, TCP don't acknowledge
> packets as such at all. TCP is not packet based. It's just that if you use
> IP as the carrier, IP itself it packet based.
> TCP can in theory generate any number of packets per second. However, the
> amount of unacknowledged data that can be outstanding at any time is limited
> by the transmit window. Each packet carries a window size, which is how much
> more data that can be accepted by the reciever. TCP can (is allowed to) send
> that much data and no more.
>
> The RTT calculations are used for figuring out how long to wait before
> doing retransmissions. You also normally have a slow start transmission
> algorithm which prevents the sender from even using the full window size
> from the start, as a way of avoiding congestions. That is used in
> combination with a backoff algorithm when retransmissions are needed to
> further decrease congestions, but all of this only really comes into effect
> if you start loosing data, and TCP actually needs to do retransmissions.
>
> Another thing you have is an algorithm called Nagle, which tries to collect
> small amount of data sent into larger packets before sending it, so that you
> don't flood the net with silly small packets.
>
> One addisional detail is that receivers normally, when the receive buffers
> becomes full, don't announce newly freed space immediately, since that is
> normally rather small amounts, but instead wait a while, until a larger part
> of the receive buffer is free, so that the sender actually can send some
> full sized packets once it starts sending again.
>
> In addition to all this, you also have a max segment size which is
> negotiated between the TCP ends, which limit the size of a single IP packet
> sent by the TCP protocol. This is done in order to try to avoid packet
> fragmentation.
>
> So the window size is actually a flow control mechanism, and is in reality
> limiting the amount of data that can be sent. And it varies all the time.
> And the number of packets that will be used for sending that much data is
> determined by the MSS (Max Segment Size).
>
> Sorry for the long text on how TCP works. :-)
>
> Johnny
>
> Edwin Fine wrote:
>
>> David,
>>
>> Thanks for trying out the benchmark.
>>
>> With my limited knowledge of TCP/IP, I believe you are seeing the 300,000
>> limit because TCP/IP requires acknowledgements to each packet, and although
>> it can batch up multiple acknowledgements in one packet, there is a
>> theoretical limit of packets per seconds beyond which it cannot go due to
>> the laws of physics. I understand that limit is determined by the Round-Trip
>> Time (RTT), which can be shown by ping. On my system, pinging 127.0.0.1 <
>> http://127.0.0.1> gives a minimum RTT of 0.018 ms (out of 16 pings). That
>> means that the maximum number of packets that can make it to and dest and
>> back per second is 1/0.000018 seconds, or 55555 packets per second. The
>> TCP/IP stack is evidently packing 5 or 6 blocks into each packet to get the
>> 300K blocks/sec you are seeing. Using Wireshark or Ethereal would confirm
>> this. I am guessing that this means that the TCP window is about 6 * 1000
>> bytes or 6KB.
>>
>> What I neglected to tell this group is that I have modified the Linux
>> sysctl.conf as follows, which might have had an effect (like I said, I am
>> not an expert):
>>
>> # increase Linux autotuning TCP buffer limits
>> # min, default, and max number of bytes to use
>> # set max to at least 4MB, or higher if you use very high BDP paths
>> net.ipv4.tcp_rmem = 4096 87380 16777216
>> net.ipv4.tcp_wmem = 4096 32768 16777216
>>
>> When I have more time, I will vary a number of different Erlang TCP/IP
>> parameters and get a data set together that gives a broader picture of the
>> effect of the parameters.
>>
>> Thanks again for taking the time.
>>
>> 2008/6/24 David Mercer <dmercer@REDACTED <mailto:dmercer@REDACTED>>:
>>
>> I tried some alternative block sizes (using the blksize option). I
>> found that from 1 to somewhere around––maybe a bit short of––1000
>> bytes, the test was able to send about 300,000 blocks in 10 seconds
>> regardless of size. (That means, 0.03 MB/sec for block size of 1,
>> 0.3 MB/sec for block size of 10, 3 MB/sec for block size of 100,
>> etc.) I suspect the system was CPU bound at those levels.
>>
>>
>> Above 1000, the number of blocks sent seemed to decrease, though
>> this was more than offset by the increased size of the blocks. Above
>> about 10,000 byte blocks (may have been less, I didn't check
>> any value between 4,000 and 10,000), however, performance peaked and
>> block size no longer mattered: it always sent between 70 and 80
>> MB/sec. My machine is clearly slower than Edwin's…
>>
>>
>> DBM
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> *From:* erlang-questions-bounces@REDACTED
>> <mailto:erlang-questions-bounces@REDACTED>
>> [mailto:erlang-questions-bounces@REDACTED
>> <mailto:erlang-questions-bounces@REDACTED>] *On Behalf Of *Rapsey
>> *Sent:* Tuesday, June 24, 2008 14:01
>> *To:* erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>> *Subject:* Re: [erlang-questions] why is gen_tcp:send slow?
>>
>>
>> You're using very large packets. I think the results would be much
>> more telling if the packets would be a few kB at most. That is
>> closer to most real life situations.
>>
>>
>> Sergej
>>
>> On Tue, Jun 24, 2008 at 8:43 PM, Edwin Fine
>> <erlang-questions_efine@REDACTED
>> <mailto:erlang-questions_efine@REDACTED>> wrote:
>>
>> I wrote a small benchmark in Erlang to see how fast I could get
>> socket communications to go. All the benchmark does is pump the same
>> buffer to a socket for (by default) 10 seconds. It uses {active,
>> once} each time, just like you do.
>>
>> Server TCP options:
>> {active, once},
>> {reuseaddr, true},
>> {packet, 0},
>> {packet_size, 65536},
>> {recbuf, 1000000}
>>
>> Client TCP options:
>> {packet, raw},
>> {packet_size, 65536},
>> {sndbuf, 1024 * 1024},
>> {send_timeout, 3000}
>>
>> Here are some results using Erlang R12B-3 (erl +K true in the Linux
>> version):
>>
>> Linux (Ubuntu 8.10 x86_64, Intel Core 2 Q6600, 8 GB):
>> - Using localhost (127.0.0.1 <http://127.0.0.1>): 7474.14 MB in
>>
>> 10.01 secs (746.66 MB/sec)
>> - Using 192.168.x.x IP address: 8064.94 MB in 10.00 secs (806.22
>> MB/sec) [Don't ask me why it's faster than using loopback, I
>> repeated the tests and got the same result]
>>
>> Windows XP SP3 (32 bits), Intel Core 2 Duo E6600:
>> - Using loopback: 2166.97 MB in 10.02 secs (216.35 MB/sec)
>> - Using 192.168.x.x IP address: 2140.72 MB in 10.02 secs (213.75
>> MB/sec)
>> - On Gigabit Ethernet to the Q6600 Linux box: 1063.61 MB in 10.02
>> secs (106.17 MB/sec) using non-jumbo frames. I don't think my router
>> supports jumbo frames.
>>
>> There's undoubtedly a huge discrepancy between the two systems,
>> whether because of kernel poll in Linux, or that it's 64 bits, or
>> unoptimized Windows TCP/IP flags, I don't know. I don't believe it's
>> the number of CPUs (there's only 1 process sending and one
>> receiving), or the CPU speed (they are both 2.4 GHz Core 2s).
>>
>> Maybe some Erlang TCP/IP gurus could comment.
>>
>> I've attached the code for interest. It's not supposed to be
>> production quality, so please don't beat me up :) although I am
>> always open to suggestions for improvement. If you do improve it,
>> I'd like to see what you've done. Maybe there is another simple
>> Erlang tcp benchmark program out there (i.e. not Tsung), but I
>> couldn't find one in a cursory Google search.
>>
>> To run:
>>
>> VM1:
>>
>> tb_server:start(Port, Opts).
>> tb_server:stop() to stop.
>>
>> Port = integer()
>> Opts = []|[opt()]
>> opt() = {atom(), term()} (Accepts inet setopts options, too)
>>
>> The server prints out the transfer rate (for simplicity).
>>
>> VM2:
>> tb_client(Host, Port, Opts).
>>
>> Host = atom()|string() hostname or IP address
>> Port, Opts as in tb_server
>>
>> Runs for 10 seconds, sending a 64K buffer as fast as possible to
>> Host/Port.
>> You can change this to 20 seconds (e.g.) by adding the tupls
>> {time_limit, 20000} to Opts.
>> You can change buffer size by adding the tuple {blksize, Bytes} to
>> Opts.
>>
>> 2008/6/20 Rapsey <rapsey@REDACTED <mailto:rapsey@REDACTED>>:
>>
>> All data goes through nginx which acts as a proxy. Its CPU
>> consumption is never over 1%.
>>
>>
>> Sergej
>>
>>
>> On Thu, Jun 19, 2008 at 9:35 PM, Javier París Fernández
>> <javierparis@REDACTED <mailto:javierparis@REDACTED>> wrote:
>>
>>
>> El 19/06/2008, a las 20:06, Rapsey escribió:
>>
>>
>> It loops from another module, that way I can update the code at
>> any time without disrupting anything.
>> The packets are generally a few hundred bytes big, except
>> keyframes which tend to be in the kB range. I haven't tried
>> looking with wireshark. Still it seems a bit odd that a large
>> CPU consumption would be the symptom. The traffic is strictly
>> one way. Either someone is sending the stream or receiving it.
>> The transmit could of course be written with a passive receive,
>> but the code would be significantly uglier. I'm sure someone
>> here knows if setting {active, once} every packet is CPU
>> intensive or not.
>> It seems the workings of gen_tcp is quite platform dependent. If
>> I run the code in windows, sending more than 128 bytes per
>> gen_tcp call significantly decreases network output.
>> Oh and I forgot to mention I use R12B-3.
>>
>>
>> Hi,
>>
>> Without being an expert.
>>
>> 200-300 mb/s in small (hundreds of bytes) packets means a *lot* of
>> system calls if you are doing a gen_tcp:send for each one. If you
>> buffer 3 packets, you are reducing that by a factor of 3 :). I'd try
>> to do an small test doing the same thing in C and compare the
>> results. I think it will also eat a lot of CPU.
>>
>> About the proxy CPU... I'm a bit lost about it, but speculating
>> wildly it is possible that the time spent doing the system calls
>> that gen_tcp is doing is added to the proxy CPU process.
>>
>> Regards.
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080624/d5745b33/attachment.htm>
More information about the erlang-questions
mailing list