[erlang-bugs] Re: [erlang-questions] epmd leaving ports in TIME_WAIT?
Tue Mar 23 18:15:18 CET 2010
I did as you suggested and ran epmd -d.
It ends up outputting something like:
epmd: Tue Mar 23 09:26:39 2010: ** sent PORT2_RESP (error) for "rodc10"
epmd: Tue Mar 23 09:26:40 2010: ** got PORT2_REQ
Over and over.
This is because one of my nodes pings (net_adm:ping) a node that doesn't
exist from time to time. (Every couple seconds or so)
Also, when epmd dies, the ports are closed properly. In any case, I find it
surprising that epmd has to open so many sockets to ask around if someone
has seen the missing node.
On Mon, Mar 22, 2010 at 11:45 AM, Michael Santos
> On Mon, Mar 22, 2010 at 11:17:25AM -0400, Nicholas Frechette wrote:
> > Escalating to erlang-bugs.
> > I've restarted both my server and laptop over the weekend.
> > On both machines, I restarted my 2 erlang applications (4 nodes,
> > in pairs: A <-> B, C <-> D, with pairs on the same computer)
> > This was yesterday. This morning I did another netstat -t, and indeed, I
> > have >100 sockets stuck in TIME_WAIT on both computers.
> Sockets in TIME_WAIT state are normal. After the socket is closed,
> the OS puts the socket into TIME_WAIT to ensure any pending packets
> queued somewhere in the network for the socket pair have time to arrive.
> Usually TIME_WAIT is 2 or 4 minutes.
> It looks as if there a is a number of TCP connections that are being
> established and closed to your epmd.
> > Both with outgoing
> > on localhost and the other pc, in about equal proportion.
> > No node has crashed/restarted. None of the nodes does anything fancy,
> > net_adm:ping to connect the nodes and then data is exchanged using
> > The problem seems somewhat related to the fact that epmd seems to restart
> > from time to time as the OS gets confused and cannot retrieve the PID
> > originally opened the sockets (although port shows it is epmd)
> What is restarting epmd?
> See anything in your logs? Maybe try running epmd in debug mode. Kill
> epmd if it is running and run: epmd -d
> > I briefly looked at the epmd code and did see a few comments in there
> > // should probably always close and a few other potential places where it
> > might leak sockets. Unfortunately I ran out of time.
> Doesn't appear to be leaking fd's, but you can check with lsof.
> > Can anyone confirm if they see similar behavior? Note that on both
> > computers, both nodes are started manually (not automated yet) and as
> > it isn't a race to see which node can start epmd first. Although, I
> > if it might be related to the problem of the epmd 100% cpu use, I believe
> > another poster made the point that it would happen when epmd runs out of
> > file descriptor (which would happen if it leaks sockets in TIME_WAIT).
> That's just one error condition; for example, the connection could have
> been aborted or the socket could have been closed. Are you seeing a lot
> of CPU usage?
More information about the erlang-bugs