[erlang-questions] Intermittent failures connecting C hidden nodes

Richard Andrews bbmaj7@REDACTED
Fri Jul 6 13:07:29 CEST 2007


--- Matthias Lang <matthias@REDACTED> wrote:

>  > * Double checking that the switches between the two machines are locks
>  > at the correct speed, eg 100Mps full-duplex.
> 
> Verifying that the interfaces are running as expected, e.g. that both
> ends have the same idea about what they're doing, is good.

TCP should take care of collisions and it is already established that data is
being exchanged between the two hosts as TCP connect succeeds which requires
SYN, SYN-ACK, ACK. Ethernet is obviously OK.

I would suggest 
 1) attach strace - you get a mountain of data but you can see what your EIO is
at the socket level.
 2) tcpdump - try and get packet trace of the connection; you can use "tcpdump
-w -X -s0 ..." and just log everything and post-process it later.

It could be a firewall or NAT issue; I've seen stateful firewalls get confused
and block connections for a long time.

It is important to find out what is happening around the connection retries (
I'm assuming this means making new TCP connections). I think packet logs are
your most important tool. Maybe you have an intermittent IP address conflict.
It might be another host outside the client and server which is causing the
problem. Might not even be connected to an erlang node - of course the response
won't make sense then.

You'll slap yourself on the forehead when you see it.



Send instant messages to your online friends http://au.messenger.yahoo.com 



More information about the erlang-questions mailing list