<div dir="ltr"><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Feb 13, 2014 at 11:39 AM, Felix Gallo <span dir="ltr"><<a href="mailto:felixgallo@gmail.com" target="_blank">felixgallo@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">I recently ran into a very scary issue that appears to be related to httpc.<div>
<br></div><div>I was hitting a web API millions of times, with varying URLs; e.g., /users/9000000, /users/9000001, etc., at a rate of around 100-400 requests/sec, using httpc:request, each request spawned by a different worker:</div>
<div><br></div><div><div>get_user(UserID) -></div><div> get_user_r(UserID, 10).</div><div>get_user_r(UserID, 0) -></div><div> io:format("dying because ran out of retries on ~p~n",[UserID]);</div><div>get_user_r(UserID, Retries) -></div>
<div> Url = lists:concat(["<a href="http://example.com/users/" target="_blank">http://example.com/users/</a>", UserID]),</div><div> Filename = lists:concat(["users/-", UserID, ".json"]),</div>
<div> io:format("requesting user: ~p~n", [UserID]),</div>
<div> case httpc:request(Url) of</div><div> {ok, Result} -></div><div> {_, _, Body} = Result,</div><div> file:write_file(Filename, Body),</div><div> userscrapemaster ! {ok, ClanID};</div><div> {error, Reason} -></div>
<div> io:format("error for user ~p: ~p~n",[UserID, Reason]),</div><div> get_members_r(UserID, Retries - 1)</div></div><div><br></div><div>A small (< 0.1%) but significant percentage of the time, the httpc:request call for completely different workers MIXED UP THEIR RESPONSES with other concurrent requests.</div>
<div><br></div><div>For example, sometimes /users/5000 returned success but provided the body that /users/5001 should have returned, and /users/5001 returned the body that /users/5002 should have returned, and /users/5002 returned the body that /users/5000 should have returned. Or, /users/5009 returned the response for /users/5010, and vice versa.</div>
<div><br></div><div>There appeared to be no obvious pattern except that all those calls were concurrent, and pragmatically I didn't have the time to go chasing into httpc to try to figure out where the state was getting scrambled, but as a test I moved the call over to lhttpc without changing the structure of the code otherwise, and the mixed responses went away.</div>
<div><br></div><div>If I get some time I'll try to dig into httpc to understand what happened there, but as a warning to others: httpc looks like it has a hidden race condition or other bug, and lhttpc does not.</div>
<span class=""><font color="#888888">
<div><br></div></font></span></div></blockquote><div><br></div><div>I've seen exactly this problem under load as well. HTTP1.1, connection rate was higher, error rate was, I believe, much lower than yours. It was on R14. Switching to lhttpc helped.</div>
<div><br></div><div>By the way, I remember someone else mentioning about this problem on the list a few years ago.</div><div><br></div><div>Anton</div></div></div></div>