[erlang-questions] Re: Frequent crashes in inets http client (R12B-5)

Fri Jun 5 19:27:26 CEST 2009

Hi Steve,

I have a bone to pick with ibrowse as well. I did discuss some
shortcoming with Chandru, but I never posted anything to the list. I
guess it's time for that now. I also thing Chandru is working on
correcting some of these.

1. Reading of requests
ibrowse reads request using active sockets. This means that any packet
that comes in will be sent to the process managing the connection as a
message. This message is then added to the front of a list, which is
reversed when the complete response is received, and then flattened. Our
response bodies were quite large, some times up to 2MB (or ever larger,
but the largest that I have actually measured was ~2MB), and at peak
time we did have quite high latency over the Internet. I think latency
makes this worse, since the data can be split in to more "active
packets", but I'm not sure. I've done some tests with ibrowse, receiving
a 2MB, 4MB and 8MB file over a slow network. The erlang OS process would
use over 800MB of resident ram when fetching the 8M file. These tests
were on a 64bit machine, which makes matters worse though.

2. Copying of data between processes
The request is first copied from the requesting process to the manager,
which will then copy it again to the process handling the connection.
When the process handling the connection is done, it will copy the
response to the manager, which will in turn copy it again to the process
doing the request. This copying is creating a lot of garbage and is
probably part of the reason why we see so much memory being used in the
previous test. It also makes the manager process' heap grow quite large
if there is a lot of traffic. Another quite important issue here is the
response being send from the process handling the connection to the
manager. This is done with a gen_server:call, which will time out after
some time. During very high CPU load (most likely from flattening many
very large lists) this call would time out and we would see lots of very
big crash reports, which would also make the error_handler use *a lot*
of memory.

3. Timeouts
The timeout handling is done in the process handling the connection, and
is using some weird calculations to come up with reasonable timeouts for
connect. Also note that (gen_tcp/ssl):send/2 can block for some time on
a congested network. Anyway, from out point of view, ibrowse doesn't
respect the actual timeout handed to the send_req call, and we had lots
of internal timeouts instead of external ones.

4. Memory usage / garbage collection
I don't really think this is an ibrowse issue, but it's interesting
anyhow. Since data was copied between lots of processes, and processes
are recycled between requests (to enable pipelining I guess) to monitor
sockets they would keep a lot of the memory they had allocated. We tried
to get around this by adding calls to garbage_collect after each request
in the client process, but I don't think this is a good way to do it.
Another approach is to use one process / request, and let it die when
it's done, which would free the memory, but this doesn't work very well
if you want to support pipelining, but why would you want to do that btw?

Hope this helps

Steve Davis wrote:
> Hi Oscar,
>
> Do you happen to know whether ibrowse suffers the same limitations?
>
> regs, /s
>
> On Jun 5, 9:42 am, Oscar Hellström <os...@REDACTED>
> wrote:
>> Not to start a flame war, but I would stay away from the inets http
>>  client if I were trying to build something serious. You can find
>> my reasons
>>
here:http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:43806:200905:gocblgddep...
>>
>>
>> Best regards
>>
>>
>>
>>
>>
>> Chris Newcombe wrote:
>>> Is there a patch for the following issue? It was reported a while
>>> ago:
>>> http://groups.google.com/group/erlang-programming/browse_thread/threa...
>>>  I didn't see any replies) Here's a bit more detail:
>>> httpc_handler is crashing with {badrecord,request} (BTW it would
>>> be great if badrecord errors also contained the incorrect term,
>>> not just the name of the expected record type) It’s crashing in
>>> httpc_handler,handle_info,2 The last message received by the
>>> gen_server is {timeout,#Ref<0.0.0.9038>} The gen_server #state is
>>> 
>>>
{state,undefined,{tcp_session,{{"my-test-url",8080},<0.709.0>},false,http,#
>>>
Port<0.1351>,1},undefined,undefined,undefined,undefined,{[],[]},pipeline,[#
>>>
Ref<0.0.0.5834>],nolimit,nolimit,{options,{undefined,[]},20000,1,100,disabl
>>> ed,enabled,false},{timers,[],#Ref<0.0.0.19293>} I think the
>>> relevant element is the first one (request). i.e. request ==
>>> undefined Given the message, it seems almost certain that the
>>> crash is in the second timeout clause of handle_info, (marked
>>> below with ***). This clause will fire even if request ==
>>> undefined, but will try to use Request#request.from, which
>>> crashes with {badrecord,request} %%% Timeouts %% Internaly, to a
>>> request handling process, a request time out is %% seen as a
>>> canceld request. handle_info({timeout, RequestId}, State =
>>> #state{request = Request = #request{id = RequestId}}) ->
>>> httpc_response:send(Request#request.from,
>>> httpc_response:error(Request,timeout)), {stop, normal,
>>> State#state{canceled = [RequestId | State#state.canceled],
>>> request = Request#request{from = answer_sent}}}; ***
>>> handle_info({timeout, RequestId}, State = #state{request =
>>> Request}) -> httpc_response:send(Request#request.from,
>>> httpc_response:error(Request,timeout)), {noreply,
>>> State#state{canceled = [RequestId | State#state.canceled]}};
>>> handle_info(timeout_pipeline, State = #state{request =
>>> undefined}) -> {stop, normal, State}; It looks like
>>> State#state.request is being set to undefined without cancelling
>>> an in-progress request timer. I've only glanced at the code, but
>>> both of the following clauses appear to do that. (But it could
>>> easily be something else.) %% On a redirect or retry the current
>>> request becomes %% obsolete and the manager will create a new
>>> request %% with the same id as the current. {redirect,
>>> NewRequest, Data}-> ok =
>>> httpc_manager:redirect_request(NewRequest, ProfileName),
>>> handle_pipeline(State#state{request = undefined}, Data); {retry,
>>> TimeNewRequest, Data}-> ok =
>>> httpc_manager:retry_request(TimeNewRequest, ProfileName),
>>> handle_pipeline(State#state{request = undefined}, Data); thanks,
>>> Chris
>> -- Oscar Hellström, os...@REDACTED Office: +44 20 7655
>> 0337 Mobile: +44 798 45 44 773 Erlang Training and Consulting
>> Ltdhttp://www.erlang-consulting.com/
>>
>> ________________________________________________________________
>> erlang-questions mailing list. Seehttp://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>

-- 
Oscar Hellström, oscar@REDACTED
Office: +44 20 7655 0337
Mobile: +44 798 45 44 773
Erlang Training and Consulting Ltd
http://www.erlang-consulting.com/