[erlang-questions] why is gen_tcp:send slow?

Edwin Fine <>
Fri Jun 20 08:24:05 CEST 2008

Which Erlang command-line options are you using? Specifically, are you using
-K true and the -A flags? Does OS/X support kernel poll (-K true)?  I saw
benchmarks where CPU usage without kernel poll was high (60 - 80%), and
without it was much lower (5 - 10%).

I wouldn't necessarily agree that "the workings of gen_tcp is quite platform
dependent." I would rather guess that TCP/IP stacks, and TCP/IP parameters,
are very different across certain operating systems. The default values are
often not even close to optimal. There are numerous registry tweaks to
improve Windows TCP/IP performance, for example. I am surprised that you are
forced to send only 128 bytes at a time or face lower performance in Erlang
on Windows. That seems odd indeed. I would be taking looks at default buffer
sizes and the registry hacks that are findable on Google, and then

I was able to improve performance of an application I am working on from 3
message/sec to 70 msgs/sec simply by spawning a function (to gen_tcp:send
the data) that was previously being called sequentially. This was because
TCP/IP could now pack multiple packets into the same frame, which previously
only had one packet in it. The RTT of the link was dreadful (290ms), so this
was a bit of a special case but I think the principle remains the same.
Transmitting data in fewer packets means fewer system calls, better
utilization of available frame space, and less CPU. Plus using -K true and
perhaps +A 128 should improve things.

Give it a try (if you already haven't) and see if it improves things. Also
take a look if you will at Boost socket performance on
which has some interesting information on this topic.

2008/6/19 Rapsey <>:

> It loops from another module, that way I can update the code at any time
> without disrupting anything.
> The packets are generally a few hundred bytes big, except keyframes which
> tend to be in the kB range. I haven't tried looking with wireshark.  Still
> it seems a bit odd that a large CPU consumption would be the symptom. The
> traffic is strictly one way. Either someone is sending the stream or
> receiving it.
> The transmit could of course be written with a passive receive, but the
> code would be significantly uglier. I'm sure someone here knows if setting
> {active, once} every packet is CPU intensive or not.
> It seems the workings of gen_tcp is quite platform dependent. If I run the
> code in windows, sending more than 128 bytes per gen_tcp call significantly
> decreases network output.
> Oh and I forgot to mention I use R12B-3.
> Sergej
> On Thu, Jun 19, 2008 at 7:39 PM, Edwin Fine <
> > wrote:
>> How large is each packet? Can multiple packets fit into one TCP window?
>> Have you looked at the TCP/IP wire-level data with Wireshark/Ethereal to see
>> if the packets are being combined at the TCP level? If you see that you are
>> only getting one packet per TCP frame (assuming a packet is much smaller
>> than the window size), you might be falling foul of the Nagle congestion
>> algorithm. The fact that manually buffering your packets improves
>> performance suggests this may be the case. Nagle says that send, send, send
>> is OK, receive, receive, receive is ok, and even send, receive, send,
>> receive is ok, but you get into trouble if sends and receives are mixed
>> asymmetrically on the same socket (e.g. send, send, receive).
>> Also, I don't understand your transmit_loop. Where is it looping (or am I
>> misunderstanding something)?
>> From what I have seen, people writing Erlang TCP/IP code do an {active,
>> once} receive, and when getting the first packet, drop into another loop
>> that does a passive receive until there's no data waiting, then go back into
>> the {active, once} receive. Are you doing this? I am not sure, but I fear
>> that if all your receives are {active, once} it will incur more CPU overhead
>> than the active/passive split. It's hard to know because I can't see enough
>> of your code to know what you are doing overall. Disclaimer: I'm no Erlang
>> or TCP/IP expert.
>> Hope this helps.
>> 2008/6/19 Rapsey <>:
>>>  I have a streaming server written in Erlang. When it was pushing 200-300
>>> mb/s the CPU was getting completely hammered. I traced the problem to
>>> gen_tcp:send.
>>> So instead of sending every audio/video packet with a single gen_tcp:send
>>> call, I buffer 3 packets and then send them all at once. CPU consumption
>>> dropped dramatically.
>>> On one of the servers I have a simple proxy, the main process that sends
>>> packets between the client and some other server looks like this:
>>> transmit_loop({tcp, Sock, Data}, P) when P#transdat.client == Sock ->
>>>     gen_tcp:send(P#transdat.server, Data),
>>>     inet:setopts(P#transdat.client, [{active, once}]),
>>>     {ok, P};
>>> transmit_loop({tcp, Sock, Data}, P) when P#transdat.server == Sock ->
>>>     gen_tcp:send(P#transdat.client, Data),
>>>     inet:setopts(P#transdat.server, [{active, once}]),
>>>     {ok, P};
>>> transmit_loop({start, ServerPort}, P) ->
>>>     {ok, Sock} = gen_tcp:connect("", ServerPort, [binary,
>>> {active, once}, {packet, 0}]),
>>>     {ok, P#transdat{server = Sock}};
>>> transmit_loop({tcp_closed, _}, _) ->
>>>     exit(stop).
>>> The proxy is eating more CPU time than the streaming server.
>>> Is this normal behavior? The server is running  OSX 10.4
>>> Sergej
>>> _______________________________________________
>>> erlang-questions mailing list
>>> http://www.erlang.org/mailman/listinfo/erlang-questions
> _______________________________________________
> erlang-questions mailing list
> http://www.erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080620/e81beb0b/attachment.html>

More information about the erlang-questions mailing list