[erlang-questions] Concurrent requests with ibrowse

steve ellis steve.e.123@REDACTED
Mon Feb 16 15:28:01 CET 2009


Thank you Edwin and Per for your suggestions.

So far no solution...

Lowering TIME_WAIT didn't have a noticeable effect. We're getting the best
results on our EC2 Fedora Core Release 4 test machine (1.7 GB of memory). On
this machine we're now we're able to push ~300 ibrowse http
request/responses through before we start to get a large number of
conn_failed and req_timedout messages from ibrowse.

Digging deeper into ibrowse... conn_failed is a timeout error from
gen_tcp:connect(). This would appear to mean that gen_tcp:connect() isn't
able to establish a connection at all.

req_timedout is triggered when the socket stays open(?) for too long (for
longer than the user supplied timeout (in our case 10 seconds)). (BTW, all
of the sites we're hitting are available in a 10 second window. Only rarely
should we get this type of timeout. Our test run hits 500 urls.)

It still seems like we don't have enough sockets available to us.

I dug deeper on this. How do you tell how many sockets a given process has
open? It seems like one way is to do a ls on /proc/[the beam process id]/fd.
This gives a list of numbers that presumably correspond to the file
descriptors (sockets) for a process. On the first pass of the test (right
after starting the erlang process), the number of fds shown are
approximately the number of successful requests (around 300). Yet, on repeat
runs the number of fds doesn't exceed approximately 1000 (they stay open for
a while), which would seem to mean that erlang still doesn't have more than
1024 sockets available to it, despite what ulimit says. This doesn't explain
why it doesn't work right on the first pass though (since we're only looping
through 500 urls).

I have tried setting ERL_MAX_PORTS to 50000 before starting erlang from the
command prompt. This doesn't appear to do anything.

What to try next? Approximately how many good request/reponse cycles should
we *expect* to get if everything is working right? (It would seem like from
what I read we should expect many many more....) Do the ibrowse folks have
any insight on any of this? Is there anything we can do to get the system to
give us more information about what is going on? Is there an erlang error
log we can look at?

PS I tried looking at the tcpdump of one of our request loops but wasn't
able to see anything meaningful there. Any idea what I should be looking for
in the tcpdump output?

Thanks!

Steve

On Sun, Feb 15, 2009 at 7:15 AM, Per Hedeland <per@REDACTED> wrote:

> Edwin Fine <erlang-questions_efine@REDACTED> wrote:
> >
> >If that doesn't help, try decreasing TIME_WAIT (but first read
> >
> http://www.erlang.org/pipermail/erlang-questions/2008-September/038154.htmland
> >http://www.developerweb.net/forum/showthread.php?t=2941).
> >
> ># Set TIME_WAIT timeout to 30 seconds instead of 120
> >sudo /sbin/sysctl -w net.ipv4.tcp_fin_timeout=30
>
> That may help, but change the TIME_WAIT time (which isn't really a
> "timeout", it's not waiting for anything to "happen") it does not, as
> one might guess from the name of the variable. It reduces the timeout
> waiting for a close from the peer in the FIN_WAIT_2 state (see the state
> diagram in RFC 793), by default 60 seconds on Linux I believe. Note that
> this is generally short duration, and you shouldn't hit the timeout
> unless connectivity with the peer is lost - but reducing it too
> agressively might cause loss of data.
>
> As far as I know there is no way to reduce the TIME_WAIT time on Linux
> other than modifying the kernel - it's a #define (60*HZ) in a kernel
> header file. There are other ways to deal with the problem of having a
> lot of connections in TIME_WAIT on Linux though.
>
> >2009/2/13 steve ellis <steve.e.123@REDACTED>
> >
> >> We're trying to build an app that uses ibrowse to make concurrent
> requests.
> >> We are not able to get more than a few concurrent requests at a time to
> >> return successfully. We repeatedly get "conn_failed"
>
> If that's all you get, I'm afraid it's pretty useless. Did it time out,
> get "connection refused" i.e. RST, was the connection established but
> immediately closed, or did it run into the "lack of ports" problem? I
> wouldn't say that unhandled "let it crash" is appropriate for problems
> occurring way below the user interface of an application, but a badmatch
> would at least have told us what gen_tcp said.
>
> Anyway if you get problems with connections in the low hundreds, "lack
> of ports" is really unlikely. So dig deeper instead of blindly trying to
> fix a problem that you may not have - find the place(s) in the source
> where 'conn_failed' is generated and make it/them report what actually
> happened, and/or use tcpdump or similar to figure out what goes wrong
> with the connections.
>
> --Per
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20090216/118ed6e6/attachment.htm>


More information about the erlang-questions mailing list