[erlang-bugs] Re: [erlang-questions] epmd leaving ports in TIME_WAIT?

Nicholas Frechette <>
Tue Mar 23 23:43:37 CET 2010


Yes, I know why my node isn't 'up', my question is why is epmd opening a
socket to query if a node is up? ie: what does it attempt to connect to?
Said node was never up in epmd's lifetime meaning it shouldn't be a cached
value or the likes. Supposing it connects to my other epmd process on my
other computer, why would it not keep it as a permanent tcp connection? It
also attemps to connect to ports on the same host, not just the other
computer so i'm a bit curious.
Is epmd implemented as a soap over an http like protocol where queries are
made over single use socket connections?

On Tue, Mar 23, 2010 at 5:04 PM, Michael Santos <>wrote:

> On Tue, Mar 23, 2010 at 01:15:18PM -0400, Nicholas Frechette wrote:
> > Hi,
> > I did as you suggested and ran epmd -d.
> > It ends up outputting something like:
> > epmd: Tue Mar 23 09:26:39 2010: ** sent PORT2_RESP (error) for "rodc10"
> > epmd: Tue Mar 23 09:26:40 2010: ** got PORT2_REQ
> >
> > Over and over.
> > This is because one of my nodes pings (net_adm:ping) a node that doesn't
> > exist from time to time. (Every couple seconds or so)
>
> Right, so every time the node connects and disconnects the TCP session
> will go into TIME_WAIT.
>
> > Also, when epmd dies, the ports are closed properly. In any case, I find
> it
> > surprising that epmd has to open so many sockets to ask around if someone
> > has seen the missing node.
>
> 1> [ begin {ok,S} = gen_tcp:connect({127,0,0,1},4369,[]), ok =
> gen_tcp:close(S) end || _ <- lists:seq(1,10000) ].
>
> That will generate 10,000 sessions in TIME_WAIT :) I guess the question
> is why your nodes keep disappearing from the network.
>
> > On Mon, Mar 22, 2010 at 11:45 AM, Michael Santos
> > <>wrote:
> >
> > > On Mon, Mar 22, 2010 at 11:17:25AM -0400, Nicholas Frechette wrote:
> > > > Escalating to erlang-bugs.
> > > > I've restarted both my server and laptop over the weekend.
> > > > On both machines, I restarted my 2 erlang applications (4 nodes,
> > > connected
> > > > in pairs: A <-> B, C <-> D, with pairs on the same computer)
> > > >
> > > > This was yesterday. This morning I did another netstat -t, and
> indeed, I
> > > > have >100 sockets stuck in TIME_WAIT on both computers.
> > >
> > > Sockets in TIME_WAIT state are normal. After the socket is closed,
> > > the OS puts the socket into TIME_WAIT to ensure any pending packets
> > > queued somewhere in the network for the socket pair have time to
> arrive.
> > > Usually TIME_WAIT is 2 or 4 minutes.
> > >
> > > It looks as if there a is a number of TCP connections that are being
> > > established and closed to your epmd.
> > >
> > > > Both with outgoing
> > > > on localhost and the other pc, in about equal proportion.
> > > > No node has crashed/restarted. None of the nodes does anything fancy,
> > > simply
> > > > net_adm:ping to connect the nodes and then data is exchanged using
> > > messages.
> > > >
> > > > The problem seems somewhat related to the fact that epmd seems to
> restart
> > > > from time to time as the OS gets confused and cannot retrieve the PID
> > > that
> > > > originally opened the sockets (although port shows it is epmd)
> > >
> > > What is restarting epmd?
> > >
> > > See anything in your logs? Maybe try running epmd in debug mode. Kill
> > > epmd if it is running and run: epmd -d
> > >
> > > > I briefly looked at the epmd code and did see a few comments in there
> > > about
> > > > // should probably always close and a few other potential places
> where it
> > > > might leak sockets. Unfortunately I ran out of time.
> > >
> > > Doesn't appear to be leaking fd's, but you can check with lsof.
> > >
> > > > Can anyone confirm if they see similar behavior? Note that on both
> > > > computers, both nodes are started manually (not automated yet) and as
> > > such
> > > > it isn't a race to see which node can start epmd first. Although, I
> > > wonder
> > > > if it might be related to the problem of the epmd 100% cpu use, I
> believe
> > > > another poster made the point that it would happen when epmd runs out
> of
> > > > file descriptor (which would happen if it leaks sockets in
> TIME_WAIT).
> > >
> > > That's just one error condition; for example, the connection could have
> > > been aborted or the socket could have been closed. Are you seeing a lot
> > > of CPU usage?
> > >
> > >
> > >
>


More information about the erlang-bugs mailing list