[erlang-questions] R13B02 on 8/16 core box: all TCP communication hangs/frozen
Wed Nov 18 18:18:38 CET 2009
On 11/18/09, Luke Gorrie <luke@REDACTED> wrote:
> Hi Scott!
> I suppose you knew the risk of idle speculation from the peanut
> gallery when you posted this to erlang-questions..
> But it seems like the most interesting tidbit is:
>> connections via Telnet or "nc" would open in 0-10 seconds, usually,
>> because most app listener sockets use a backlog size of 4096, but no
>> sign of system call activity by the VM
> Are you saying that it sometimes takes several seconds to establish a
> loopback socket connection from telnet or netcat? That sounds
> extremely fishy! The kernel (not BEAM) is the one responsible for
> getting the socket to ESTABLISHED state and if that doesn't happen
> within a few milliseconds then it sounds like your kernel is
> performing very badly.
My *first* question while reading the original description: "OS?
Kernel version? patch levels?" (It's ot that obvious from the words
"Dell", as I'd prefer to deploy Solaris x86, while you might prefer
OpenSolaris and the next guy Windows.. and especially since the later
Solaris and OpenSolaris have been looking at going to newer network
TCP/IP stacks for zone related fun ;)
Also: uptime of the different systems?
another why to do the "debugging": truss -ffo /big/disk/tracefiles
erl (or the equivalent strace options) and then to wait/initiate the
"bug", that way we can have info into the long term kernel information
(perhaps even also sotruss??) or else attach to the processes as you
get to trigger this issue/bug.
Also: You have checked that your kernel is capable of your number of
> Can the kernel possibly be very busy? (I watch 'vmstat 1' to check.)
> If it is busy you could run e.g. oprofile to find out where. One
> common cause is "too many open <something>" hitting a bad performance
> case in a kernel data structure. Can be files, sockets, routing
> entries, iptables conntrack entries, etc (oprofile should make it easy
> to see which).
> If you still have the machine wedged I'd be curious to see a page of
> 'vmstat 1' output and also the full set of open sockets ('netstat
> -tlnp') and anything else that might be overloaded (are you using e.g.
> advanced routing or filewalling features?)
> If you want a sounding board let me know and I'll tell you my mobile number.
> -Luke (damn I miss these problems!)
> P.S. If the list does NOT get two copies of this mail then it's
> annoyingly hard to post via Gmane.
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
More information about the erlang-questions