[erlang-questions] gen_tcp very slow to fetch data
zabrane Mikael
zabrane3@REDACTED
Mon Nov 23 17:55:56 CET 2009
Thanks for the lesson guys !
Regards
Zabrane
2009/11/23 Joe Armstrong <erlang@REDACTED>
> On Sat, Nov 21, 2009 at 1:02 AM, zabrane Mikael <zabrane3@REDACTED>
> wrote:
> > Hi List !
> >
> > While trying to learn how to write a simple TCP Web Server in Erlang
> which
> > only dump what it gets to stdout, I realize that time to time, the HTTP
> > requests get truncated when reaching the server.
>
> Whenever I read something like that I think - "fragmentation".
>
> If you write N bytes to a TCP socket, you will eventually be able to
> read N bytes from the socket
> but the bytes may or may not be delivered "all in one go". Since you
> have said {packet, 0} you'll just
> get whatever happened to be read. This is why you *must* write a
> re-entrant parser.
>
> First you collect data until you see "\r\n\r\n" - only then can you
> parse the header.
> Then you check for a content length header. If you find a content
> length header it will contain the
> content length (N). Then you collect *exactly* N bytes following the
> "\r\n\r\n". Otherwise you collect
> until the socket closes (there is also a chunked alternative which I
> will ignore)
>
> The code in http://www.sics.se/~joe/tutorials/web_server/http_driver.erl
> does this:
>
> If you don't do this your program will work sometimes - in the case
> where the incoming packets were not
> fragmented but it will fail mysteriously if the packets are fragmented.
>
> Forgetting about fragmentation is the "first basic" mistake that
> *everybody* makes when writing
> networking code - ....
>
> This mistake happens often when you deploy something. You test it
> locally on localhost it works.
> You test it live on the Internet - it fails.
>
> Why? Packets are not often fragmented on localhost - but very rare.
> The chance of fragmentation on the Internet is very high - even if you
> have good connection.
>
> aside: this is why one of the tcp options is {packet, N} - if you
> write a client AND a server in Erlang
> and BOTH use (say) {packet,4} then gen_tcp will silently reassemble
> fragmented packets behind the scenes
> before delivering them to the application program.
>
> This together with term_to_binary (and its inverse) and the bit syntax
> will save you many sleepless nights.
>
>
> > My main socket loop looks
> > like:
> >
> > -------------------------------------------
> > -define(TCP_OPTIONS,[binary, {packet, 0}, {active, false}, {reuseaddr,
> > true}]).
> > ...
> > loop_recv(Socket)
> > case gen_tcp:recv(Socket, 0) of
> > {ok, BinData} ->
> > %% here, I'm assuming that all the HTTP request (Headers +
> > Body) is in "BinData". Hope I'm right.
>
> No No No - programs should not depend upon Hope. This assumption is wrong.
>
> Hint - print out the packet lengths, so you can see the pack lengths ..
>
> /Joe
>
>
> > io:format("BinData: ~p~n", []),
> > ok;
> > NotOK ->
> > error_logger:info_report([{"gen_tcp:recv/2", NotOK}]),
> > error
> > end.
> > -------------------------------------------
> >
> > For some requests, the "io:format" prints a truncated data (in BinData):
> >
> > <<"
> >
> http://www.foo.tv/images/v30/LoaderV3.swf?loop=false&quality=high&request=357&HTTP/1.0
> \r\nHost:
> > www.foo.tv\r\nUser-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X
> 10.5;
> > fr; rv:1.9.0.2) Gecko/2008090512 Firefox/3.0.2\r\nAccept: text/h">>
> >
> > As you can see, the request isn't complete "... Accept: text/h".
> >
> > Am I doing somthing wrong? How can I fix it please?
> >
> > Regards
> > Zabrane
> >
> > 2009/11/17 Tony Rogvall <tony@REDACTED>
> >
> >> Do not forget about {active, once} mode.
> >> {active,once} will receive one message (depends on buffer size etc)
> >> the it will switch to passive mode. To get the next message you use
> >> inet:setopts(Socket, [{active,once}]) to activate it again. This mode
> >> enables
> >> a selective receive at the same time as it enables flow control.
> >>
> >> /Tony
> >>
> >>
> >> On 17 nov 2009, at 01.51, Ngoc Dao wrote:
> >>
> >> >> From inet's doc:
> >> >
> >>
> http://www1.erlang.org/documentation/doc-4.9.1/lib/kernel-2.4.1/doc/html/inet.html
> >> >
> >> > If the active option is true, which is the default, everything
> >> > received from the socket will be sent as messages to the receiving
> >> > process. If the active option is set to false (passive mode), the
> >> > process must explicitly receive incoming data by calling
> >> > gen_tcp:recv/N or gen_udp:recv/N (depending on the type of socket).
> >> > Note: Passive mode provides flow control; the other side will not be
> >> > able send faster than the receiver can read. Active mode provides no
> >> > flow control; a fast sender could easily overflow the receiver with
> >> > incoming messages. Use active mode only if your high-level protocol
> >> > provides its own flow control (for instance, acknowledging received
> >> > messages) or the amount of data exchanged is small.
> >> >
> >> >
> >> > On Tue, Nov 17, 2009 at 2:59 AM, ERLANG <erlangy@REDACTED> wrote:
> >> >> Hi Chandru !
> >> >>
> >> >> That's fix my problem. Thanks.
> >> >> While googling a bit, I found two ways to read from the Socket:
> >> >>
> >> >> recv(Socket, Bin) ->
> >> >> receive
> >> >> {tcp, Socket, B} ->
> >> >> io:format(".", []),
> >> >> recv(Socket, concat_binary([Bin, B]));
> >> >> {tcp_closed, Socket} ->
> >> >> {ok, Bin};
> >> >> Other ->
> >> >> {error, {socket, Other}}
> >> >> after
> >> >> ?TIMEOUT ->
> >> >> {error, {socket, timeout}}
> >> >> end.
> >> >>
> >> >> % version 2 with "gen_tcp:recv"
> >> >> recv2(Socket, Bin) ->
> >> >> case gen_tcp:recv(Socket, 0, ?TIMEOUT) of
> >> >> {ok, B} ->
> >> >> io:format(".", []),
> >> >> recv(Socket, concat_binary([Bin, B]));
> >> >> {error, closed} ->
> >> >> {ok, Bin};
> >> >> {error, timeout} ->
> >> >> {error, {socket, timeout}};
> >> >> Other ->
> >> >> {error, {socket, Other}}
> >> >> end.
> >> >>
> >> >>
> >> >> Which one is the best in my case (see below: fetch.erl)?
> >> >>
> >> >> Regards
> >> >> Zabrane
> >> >>
> >> >> Le 16 nov. 09 à 18:53, Chandru a écrit :
> >> >>
> >> >>> You are expecting the server to indicate end of response by closing
> the
> >> >>> connection, but because you specify HTTP/1.1 in the request, the
> server
> >> is
> >> >>> holding up your connection, and you are timing out. Try replacing
> >> HTTP/1.1
> >> >>> with HTTP/1.0 in your request, or parse the response to detect end
> of
> >> >>> response.
> >> >>>
> >> >>> cheers
> >> >>> Chandru
> >> >>>
> >> >>> 2009/11/16 zabrane Mikael <zabrane3@REDACTED>
> >> >>>
> >> >>>> Hi List !
> >> >>>>
> >> >>>> New to Erlang, I'm trying to implement a simple URL fetcher.
> >> >>>> Here's my code (please, feel free to correct it if you find any bug
> or
> >> >>>> know
> >> >>>> a better approach):
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >>
> 8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8----
> >> >>>> -module(fetch).
> >> >>>>
> >> >>>> -export([url/1]).
> >> >>>>
> >> >>>> -define(TIMEOUT, 7000).
> >> >>>> -define(TCP_OPTS, [binary, {packet, raw}, {nodelay, true},
> >> >>>> {active, true}]).
> >> >>>>
> >> >>>> url(Url) ->
> >> >>>> {ok, _Tag, Host, Port} = split_url(Url),
> >> >>>>
> >> >>>> Hdrs = [],
> >> >>>> Request = ["GET ", Url, " HTTP/1.1\r\n", Hdrs, "\r\n\r\n"],
> >> >>>>
> >> >>>> case catch gen_tcp:connect(Host, Port, ?TCP_OPTS) of
> >> >>>> {'EXIT', Why} ->
> >> >>>> {error, {socket_exit, Why}};
> >> >>>> {error, Why} ->
> >> >>>> {error, {socket_error, Why}};
> >> >>>> {ok, Socket} ->
> >> >>>> gen_tcp:send(Socket, list_to_binary(Request)),
> >> >>>> recv(Socket, list_to_binary([]))
> >> >>>> end.
> >> >>>>
> >> >>>> recv(Socket, Bin) ->
> >> >>>> receive
> >> >>>> {tcp, Socket, B} ->
> >> >>>> io:format(".", []),
> >> >>>> recv(Socket, concat_binary([Bin, B]));
> >> >>>> {tcp_closed, Socket} ->
> >> >>>> {ok, Bin};
> >> >>>> Other ->
> >> >>>> {error, {socket, Other}}
> >> >>>> after
> >> >>>> ?TIMEOUT ->
> >> >>>> {error, {socket, timeout}}
> >> >>>> end.
> >> >>>>
> >> >>>>
> >> >>>> split_url([$h,$t,$t,$p,$:,$/,$/|T]) -> split_url(http, T);
> >> >>>> split_url(_X) -> {error, split_url}.
> >> >>>>
> >> >>>> split_url(Tag, X) ->
> >> >>>> case string:chr(X, $:) of
> >> >>>> 0 ->
> >> >>>> Port = 80,
> >> >>>> case string:chr(X,$/) of
> >> >>>> 0 ->
> >> >>>> {ok, Tag, X, Port};
> >> >>>> N ->
> >> >>>> Site = string:substr(X,1,N-1),
> >> >>>> {ok, Tag, Site, Port}
> >> >>>> end;
> >> >>>> N1 ->
> >> >>>> case string:chr(X,$/) of
> >> >>>> 0 ->
> >> >>>> error;
> >> >>>> N2 ->
> >> >>>> PortStr = string:substr(X,N1+1, N2-N1-1),
> >> >>>> case catch list_to_integer(PortStr) of
> >> >>>> {'EXIT', _} ->
> >> >>>> {error, port_number};
> >> >>>> Port ->
> >> >>>> Site = string:substr(X,1,N1-1),
> >> >>>> {ok, Tag, Site, Port}
> >> >>>> end
> >> >>>> end
> >> >>>> end.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >>
> 8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8-----8------
> >> >>>>
> >> >>>> When testing it, the receiving socket gets very very slow:
> >> >>>> $ erl
> >> >>>> 1> c(fetch).
> >> >>>> 2> Bin = fetch:url("http://www.google.com").
> >> >>>> ......{error,{socket,timeout}}
> >> >>>>
> >> >>>> Am I missing something?
> >> >>>> What I like to get at the end is a very fast fetcher. Any hint?
> >> >>>>
> >> >>>> Regards
> >> >>>> Zabrane
> >> >>>>
> >> >>
> >> >>
> >> >> ________________________________________________________________
> >> >> erlang-questions mailing list. See http://www.erlang.org/faq.html
> >> >> erlang-questions (at) erlang.org
> >> >>
> >> >>
> >> >
> >> > ________________________________________________________________
> >> > erlang-questions mailing list. See http://www.erlang.org/faq.html
> >> > erlang-questions (at) erlang.org
> >> >
> >>
> >>
> >
>
More information about the erlang-questions
mailing list