[erlang-questions] httpc considered harmful

Anton Lavrik alavrik@REDACTED
Fri Feb 14 09:33:27 CET 2014


On Thu, Feb 13, 2014 at 11:39 AM, Felix Gallo <felixgallo@REDACTED> wrote:

> I recently ran into a very scary issue that appears to be related to httpc.
>
> I was hitting a web API millions of times, with varying URLs; e.g.,
> /users/9000000, /users/9000001, etc., at a rate of around 100-400
> requests/sec, using httpc:request, each request spawned by a different
> worker:
>
> get_user(UserID) ->
>   get_user_r(UserID, 10).
> get_user_r(UserID, 0) ->
>   io:format("dying because ran out of retries on ~p~n",[UserID]);
> get_user_r(UserID, Retries) ->
>   Url = lists:concat(["http://example.com/users/", UserID]),
>   Filename = lists:concat(["users/-", UserID, ".json"]),
>   io:format("requesting user: ~p~n", [UserID]),
>   case httpc:request(Url) of
>     {ok, Result} ->
>       {_, _, Body} = Result,
>       file:write_file(Filename, Body),
>       userscrapemaster ! {ok, ClanID};
>     {error, Reason} ->
>       io:format("error for user ~p: ~p~n",[UserID, Reason]),
>       get_members_r(UserID, Retries - 1)
>
> A small (< 0.1%) but significant percentage of the time, the httpc:request
> call for completely different workers MIXED UP THEIR RESPONSES with other
> concurrent requests.
>
> For example, sometimes /users/5000 returned success but provided the body
> that /users/5001 should have returned, and /users/5001 returned the body
> that /users/5002 should have returned, and /users/5002 returned the body
> that /users/5000 should have returned.  Or, /users/5009 returned the
> response for /users/5010, and vice versa.
>
> There appeared to be no obvious pattern except that all those calls were
> concurrent, and pragmatically I didn't have the time to go chasing into
> httpc to try to figure out where the state was getting scrambled, but as a
> test I moved the call over to lhttpc without changing the structure of the
> code otherwise, and the mixed responses went away.
>
> If I get some time I'll try to dig into httpc to understand what happened
> there, but as a warning to others: httpc looks like it has a hidden race
> condition or other bug, and lhttpc does not.
>
>
I've seen exactly this problem under load as well. HTTP1.1, connection rate
was higher, error rate was, I believe, much lower than yours. It was on
R14. Switching to lhttpc helped.

By the way, I remember someone else mentioning about this problem on the
list a few years ago.

Anton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140214/21f55271/attachment.htm>


More information about the erlang-questions mailing list