[erlang-questions] gen_tcp very slow to fetch data

Joe Armstrong erlang@REDACTED
Mon Nov 23 16:39:18 CET 2009


On Sat, Nov 21, 2009 at 1:02 AM, zabrane Mikael <zabrane3@REDACTED> wrote:
> Hi List !
>
> While trying to learn how to write a simple TCP Web Server in Erlang which
> only dump what it gets to stdout, I realize that time to time, the HTTP
> requests get truncated when reaching the server.

Whenever I read something like that I think - "fragmentation".

If you write N bytes to a TCP socket, you will eventually be able to
read N bytes from the socket
but the bytes may or may not be delivered "all in one go". Since you
have said {packet, 0} you'll just
get whatever happened to be read. This is why you *must* write a
re-entrant parser.

First you collect data until you see "\r\n\r\n" - only then can you
parse the header.
Then you check for a content length header. If you find a content
length header it will contain the
content length (N). Then you collect *exactly* N bytes following the
"\r\n\r\n". Otherwise you collect
until the socket closes (there is also a chunked alternative which I
will ignore)

The code in http://www.sics.se/~joe/tutorials/web_server/http_driver.erl
does this:

If you don't do this your program will work sometimes - in the case
where the incoming packets were not
fragmented but it will fail mysteriously if the packets are fragmented.

Forgetting about fragmentation is the "first basic" mistake that
*everybody* makes when writing
networking code - ....

This mistake happens often when you deploy something. You test it
locally on localhost it works.
You test it live on the Internet - it fails.

Why? Packets are not often fragmented on localhost - but very rare.
The chance of fragmentation on the Internet is very high - even if you
have good connection.

aside: this is why one of the tcp options is {packet, N} - if you
write a client AND a server in Erlang
and BOTH use (say) {packet,4} then gen_tcp will silently reassemble
fragmented packets behind the scenes
before delivering them to the application program.

This together with term_to_binary (and its inverse) and the bit syntax
will save you many sleepless nights.


> My main socket loop looks
> like:
>
> -------------------------------------------
> -define(TCP_OPTIONS,[binary, {packet, 0}, {active, false}, {reuseaddr,
> true}]).
> ...
> loop_recv(Socket)
>   case gen_tcp:recv(Socket, 0) of
>        {ok, BinData} ->
>             %% here, I'm assuming that all the HTTP request (Headers +
> Body) is in "BinData". Hope I'm right.

No No No - programs should not depend upon Hope. This assumption is wrong.

Hint - print out the packet lengths, so you can see the pack lengths ..

/Joe


>             io:format("BinData: ~p~n", []),
>             ok;
>        NotOK ->
>            error_logger:info_report([{"gen_tcp:recv/2", NotOK}]),
>            error
>    end.
> -------------------------------------------
>
> For some requests, the "io:format" prints a truncated data (in BinData):
>
> <<"
> http://www.foo.tv/images/v30/LoaderV3.swf?loop=false&quality=high&request=357&HTTP/1.0\r\nHost:
> www.foo.tv\r\nUser-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5;
> fr; rv:1.9.0.2) Gecko/2008090512 Firefox/3.0.2\r\nAccept: text/h">>
>
> As you can see, the request isn't complete "... Accept: text/h".
>
> Am I doing somthing wrong? How can I fix it please?
>
> Regards
> Zabrane
>
> 2009/11/17 Tony Rogvall <tony@REDACTED>
>
>> Do not forget about {active, once} mode.
>> {active,once} will receive one message (depends on buffer size etc)
>> the it will switch to passive mode. To get the next message you use
>> inet:setopts(Socket, [{active,once}]) to activate it again. This mode
>> enables
>> a selective receive at the same time as it enables flow control.
>>
>> /Tony
>>
>>
>> On 17 nov 2009, at 01.51, Ngoc Dao wrote:
>>
>> >> From inet's doc:
>> >
>> http://www1.erlang.org/documentation/doc-4.9.1/lib/kernel-2.4.1/doc/html/inet.html
>> >
>> > If the active option is true, which is the default, everything
>> > received from the socket will be sent as messages to the receiving
>> > process. If the active option is set to false (passive mode), the
>> > process must explicitly receive incoming data by calling
>> > gen_tcp:recv/N or gen_udp:recv/N  (depending on the type of socket).
>> > Note: Passive mode provides flow control; the other side will not be
>> > able send faster than the receiver can read. Active mode provides no
>> > flow control; a fast sender could easily overflow the receiver with
>> > incoming messages. Use active mode only if your high-level protocol
>> > provides its own flow control (for instance, acknowledging received
>> > messages) or the amount of data exchanged is small.
>> >
>> >
>> > On Tue, Nov 17, 2009 at 2:59 AM, ERLANG <erlangy@REDACTED> wrote:
>> >> Hi Chandru !
>> >>
>> >> That's fix my problem. Thanks.
>> >> While googling a bit, I found two ways to read from the Socket:
>> >>
>> >> recv(Socket, Bin) ->
>> >>    receive
>> >>        {tcp, Socket, B} ->
>> >>            io:format(".", []),
>> >>            recv(Socket, concat_binary([Bin, B]));
>> >>        {tcp_closed, Socket} ->
>> >>            {ok, Bin};
>> >>        Other ->
>> >>            {error, {socket, Other}}
>> >>        after
>> >>            ?TIMEOUT ->
>> >>            {error, {socket, timeout}}
>> >>    end.
>> >>
>> >> % version 2 with "gen_tcp:recv"
>> >> recv2(Socket, Bin) ->
>> >>    case gen_tcp:recv(Socket, 0, ?TIMEOUT) of
>> >>         {ok, B} ->
>> >>             io:format(".", []),
>> >>             recv(Socket, concat_binary([Bin, B]));
>> >>         {error, closed} ->
>> >>             {ok, Bin};
>> >>        {error, timeout} ->
>> >>             {error, {socket, timeout}};
>> >>         Other ->
>> >>             {error, {socket, Other}}
>> >>     end.
>> >>
>> >>
>> >> Which one is the best in my case (see below: fetch.erl)?
>> >>
>> >> Regards
>> >> Zabrane
>> >>
>> >> Le 16 nov. 09 à 18:53, Chandru a écrit :
>> >>
>> >>> You are expecting the server to indicate end of response by closing the
>> >>> connection, but because you specify HTTP/1.1 in the request, the server
>> is
>> >>> holding up your connection, and you are timing out. Try replacing
>> HTTP/1.1
>> >>> with HTTP/1.0 in your request, or parse the response to detect end of
>> >>> response.
>> >>>
>> >>> cheers
>> >>> Chandru
>> >>>
>> >>> 2009/11/16 zabrane Mikael <zabrane3@REDACTED>
>> >>>
>> >>>> Hi List !
>> >>>>
>> >>>> New to Erlang, I'm trying to implement a simple URL fetcher.
>> >>>> Here's my code (please, feel free to correct it if you find any bug or
>> >>>> know
>> >>>> a better approach):
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> 8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8----
>> >>>> -module(fetch).
>> >>>>
>> >>>> -export([url/1]).
>> >>>>
>> >>>> -define(TIMEOUT,    7000).
>> >>>> -define(TCP_OPTS,   [binary, {packet, raw}, {nodelay, true},
>> >>>>                   {active, true}]).
>> >>>>
>> >>>> url(Url) ->
>> >>>>  {ok, _Tag, Host, Port} = split_url(Url),
>> >>>>
>> >>>>  Hdrs = [],
>> >>>>  Request = ["GET ", Url, " HTTP/1.1\r\n", Hdrs, "\r\n\r\n"],
>> >>>>
>> >>>>  case catch gen_tcp:connect(Host, Port, ?TCP_OPTS) of
>> >>>>    {'EXIT', Why} ->
>> >>>>          {error, {socket_exit, Why}};
>> >>>>      {error, Why} ->
>> >>>>          {error, {socket_error, Why}};
>> >>>>      {ok, Socket} ->
>> >>>>          gen_tcp:send(Socket, list_to_binary(Request)),
>> >>>>          recv(Socket, list_to_binary([]))
>> >>>>  end.
>> >>>>
>> >>>> recv(Socket, Bin) ->
>> >>>>  receive
>> >>>>      {tcp, Socket, B} ->
>> >>>>          io:format(".", []),
>> >>>>          recv(Socket, concat_binary([Bin, B]));
>> >>>>      {tcp_closed, Socket} ->
>> >>>>          {ok, Bin};
>> >>>>      Other ->
>> >>>>          {error, {socket, Other}}
>> >>>> after
>> >>>>  ?TIMEOUT ->
>> >>>>          {error, {socket, timeout}}
>> >>>>  end.
>> >>>>
>> >>>>
>> >>>> split_url([$h,$t,$t,$p,$:,$/,$/|T]) ->  split_url(http, T);
>> >>>> split_url(_X)                       ->  {error, split_url}.
>> >>>>
>> >>>> split_url(Tag, X) ->
>> >>>>  case string:chr(X, $:) of
>> >>>>      0 ->
>> >>>>          Port = 80,
>> >>>>          case string:chr(X,$/) of
>> >>>>              0 ->
>> >>>>                  {ok, Tag, X, Port};
>> >>>>              N ->
>> >>>>                  Site = string:substr(X,1,N-1),
>> >>>>                  {ok, Tag, Site, Port}
>> >>>>          end;
>> >>>>      N1 ->
>> >>>>          case string:chr(X,$/) of
>> >>>>              0 ->
>> >>>>                  error;
>> >>>>              N2 ->
>> >>>>                  PortStr = string:substr(X,N1+1, N2-N1-1),
>> >>>>                  case catch list_to_integer(PortStr) of
>> >>>>                      {'EXIT', _} ->
>> >>>>                          {error, port_number};
>> >>>>                      Port ->
>> >>>>                          Site = string:substr(X,1,N1-1),
>> >>>>                          {ok, Tag, Site, Port}
>> >>>>                  end
>> >>>>          end
>> >>>>  end.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> 8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8------
>> >>>>
>> >>>> When testing it, the receiving socket gets very very slow:
>> >>>> $ erl
>> >>>> 1> c(fetch).
>> >>>> 2> Bin = fetch:url("http://www.google.com").
>> >>>> ......{error,{socket,timeout}}
>> >>>>
>> >>>> Am I missing something?
>> >>>> What I like to get at the end is a very fast fetcher. Any hint?
>> >>>>
>> >>>> Regards
>> >>>> Zabrane
>> >>>>
>> >>
>> >>
>> >> ________________________________________________________________
>> >> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> >> erlang-questions (at) erlang.org
>> >>
>> >>
>> >
>> > ________________________________________________________________
>> > erlang-questions mailing list. See http://www.erlang.org/faq.html
>> > erlang-questions (at) erlang.org
>> >
>>
>>
>


More information about the erlang-questions mailing list