[erlang-questions] Re: Frequent crashes in inets http client (R12B-5)

Sun Jun 14 08:49:50 CEST 2009

Hi Oscar,

Thanks for the feedback. I've hacked ibrowse to use binaries internally
instead of lists. I have an experimental version in case anyone is
interested. I'll announce an update as soon as I'm confident nothing is
broken.

cheers
Chandru

2009/6/5 Oscar Hellström <oscar@REDACTED>

> Hi Steve,
>
> I have a bone to pick with ibrowse as well. I did discuss some
> shortcoming with Chandru, but I never posted anything to the list. I
> guess it's time for that now. I also thing Chandru is working on
> correcting some of these.
>
> 1. Reading of requests
> ibrowse reads request using active sockets. This means that any packet
> that comes in will be sent to the process managing the connection as a
> message. This message is then added to the front of a list, which is
> reversed when the complete response is received, and then flattened. Our
> response bodies were quite large, some times up to 2MB (or ever larger,
> but the largest that I have actually measured was ~2MB), and at peak
> time we did have quite high latency over the Internet. I think latency
> makes this worse, since the data can be split in to more "active
> packets", but I'm not sure. I've done some tests with ibrowse, receiving
> a 2MB, 4MB and 8MB file over a slow network. The erlang OS process would
> use over 800MB of resident ram when fetching the 8M file. These tests
> were on a 64bit machine, which makes matters worse though.
>
> 2. Copying of data between processes
> The request is first copied from the requesting process to the manager,
> which will then copy it again to the process handling the connection.
> When the process handling the connection is done, it will copy the
> response to the manager, which will in turn copy it again to the process
> doing the request. This copying is creating a lot of garbage and is
> probably part of the reason why we see so much memory being used in the
> previous test. It also makes the manager process' heap grow quite large
> if there is a lot of traffic. Another quite important issue here is the
> response being send from the process handling the connection to the
> manager. This is done with a gen_server:call, which will time out after
> some time. During very high CPU load (most likely from flattening many
> very large lists) this call would time out and we would see lots of very
> big crash reports, which would also make the error_handler use *a lot*
> of memory.
>
> 3. Timeouts
> The timeout handling is done in the process handling the connection, and
> is using some weird calculations to come up with reasonable timeouts for
> connect. Also note that (gen_tcp/ssl):send/2 can block for some time on
> a congested network. Anyway, from out point of view, ibrowse doesn't
> respect the actual timeout handed to the send_req call, and we had lots
> of internal timeouts instead of external ones.
>
> 4. Memory usage / garbage collection
> I don't really think this is an ibrowse issue, but it's interesting
> anyhow. Since data was copied between lots of processes, and processes
> are recycled between requests (to enable pipelining I guess) to monitor
> sockets they would keep a lot of the memory they had allocated. We tried
> to get around this by adding calls to garbage_collect after each request
> in the client process, but I don't think this is a good way to do it.
> Another approach is to use one process / request, and let it die when
> it's done, which would free the memory, but this doesn't work very well
> if you want to support pipelining, but why would you want to do that btw?
>
> Hope this helps
>
> Steve Davis wrote:
> > Hi Oscar,
> >
> > Do you happen to know whether ibrowse suffers the same limitations?
> >
> > regs, /s
> >
> > On Jun 5, 9:42 am, Oscar Hellström <os...@REDACTED>
> > wrote:
> >> Not to start a flame war, but I would stay away from the inets http
> >>  client if I were trying to build something serious. You can find
> >> my reasons
> >>
> here:
> http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:43806:200905:gocblgddep...
> >>
> >>
> >> Best regards
> >>
> >>
> >>
> >>
> >>
> >> Chris Newcombe wrote:
> >>> Is there a patch for the following issue? It was reported a while
> >>> ago:
> >>> http://groups.google.com/group/erlang-programming/browse_thread/threa.
> ..
> >>>  I didn't see any replies) Here's a bit more detail:
> >>> httpc_handler is crashing with {badrecord,request} (BTW it would
> >>> be great if badrecord errors also contained the incorrect term,
> >>> not just the name of the expected record type) It’s crashing in
> >>> httpc_handler,handle_info,2 The last message received by the
> >>> gen_server is {timeout,#Ref<0.0.0.9038>} The gen_server #state is
> >>>
> >>>
> {state,undefined,{tcp_session,{{"my-test-url",8080},<0.709.0>},false,http,#
> >>>
> Port<0.1351>,1},undefined,undefined,undefined,undefined,{[],[]},pipeline,[#
> >>>
> Ref<0.0.0.5834>],nolimit,nolimit,{options,{undefined,[]},20000,1,100,disabl
> >>> ed,enabled,false},{timers,[],#Ref<0.0.0.19293>} I think the
> >>> relevant element is the first one (request). i.e. request ==
> >>> undefined Given the message, it seems almost certain that the
> >>> crash is in the second timeout clause of handle_info, (marked
> >>> below with ***). This clause will fire even if request ==
> >>> undefined, but will try to use Request#request.from, which
> >>> crashes with {badrecord,request} %%% Timeouts %% Internaly, to a
> >>> request handling process, a request time out is %% seen as a
> >>> canceld request. handle_info({timeout, RequestId}, State =
> >>> #state{request = Request = #request{id = RequestId}}) ->
> >>> httpc_response:send(Request#request.from,
> >>> httpc_response:error(Request,timeout)), {stop, normal,
> >>> State#state{canceled = [RequestId | State#state.canceled],
> >>> request = Request#request{from = answer_sent}}}; ***
> >>> handle_info({timeout, RequestId}, State = #state{request =
> >>> Request}) -> httpc_response:send(Request#request.from,
> >>> httpc_response:error(Request,timeout)), {noreply,
> >>> State#state{canceled = [RequestId | State#state.canceled]}};
> >>> handle_info(timeout_pipeline, State = #state{request =
> >>> undefined}) -> {stop, normal, State}; It looks like
> >>> State#state.request is being set to undefined without cancelling
> >>> an in-progress request timer. I've only glanced at the code, but
> >>> both of the following clauses appear to do that. (But it could
> >>> easily be something else.) %% On a redirect or retry the current
> >>> request becomes %% obsolete and the manager will create a new
> >>> request %% with the same id as the current. {redirect,
> >>> NewRequest, Data}-> ok =
> >>> httpc_manager:redirect_request(NewRequest, ProfileName),
> >>> handle_pipeline(State#state{request = undefined}, Data); {retry,
> >>> TimeNewRequest, Data}-> ok =
> >>> httpc_manager:retry_request(TimeNewRequest, ProfileName),
> >>> handle_pipeline(State#state{request = undefined}, Data); thanks,
> >>> Chris
> >> -- Oscar Hellström, os...@REDACTED Office: +44 20 7655
> >> 0337 Mobile: +44 798 45 44 773 Erlang Training and Consulting
> >> Ltdhttp://www.erlang-consulting.com/
> >>
> >> ________________________________________________________________
> >> erlang-questions mailing list. Seehttp://www.erlang.org/faq.html
> >> erlang-questions (at) erlang.org
> >
> > ________________________________________________________________
> > erlang-questions mailing list. See http://www.erlang.org/faq.html
> > erlang-questions (at) erlang.org
> >
>
>
> --
> Oscar Hellström, oscar@REDACTED
> Office: +44 20 7655 0337
> Mobile: +44 798 45 44 773
> Erlang Training and Consulting Ltd
> http://www.erlang-consulting.com/
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>