[erlang-questions] epmd leaving ports in TIME_WAIT?

Nicholas Frechette zeno490@REDACTED
Mon Mar 22 16:17:25 CET 2010


Escalating to erlang-bugs.
I've restarted both my server and laptop over the weekend.
On both machines, I restarted my 2 erlang applications (4 nodes, connected
in pairs: A <-> B, C <-> D, with pairs on the same computer)

This was yesterday. This morning I did another netstat -t, and indeed, I
have >100 sockets stuck in TIME_WAIT on both computers. Both with outgoing
on localhost and the other pc, in about equal proportion.
No node has crashed/restarted. None of the nodes does anything fancy, simply
net_adm:ping to connect the nodes and then data is exchanged using messages.

The problem seems somewhat related to the fact that epmd seems to restart
from time to time as the OS gets confused and cannot retrieve the PID that
originally opened the sockets (although port shows it is epmd)

I briefly looked at the epmd code and did see a few comments in there about
// should probably always close and a few other potential places where it
might leak sockets. Unfortunately I ran out of time.

Can anyone confirm if they see similar behavior? Note that on both
computers, both nodes are started manually (not automated yet) and as such
it isn't a race to see which node can start epmd first. Although, I wonder
if it might be related to the problem of the epmd 100% cpu use, I believe
another poster made the point that it would happen when epmd runs out of
file descriptor (which would happen if it leaks sockets in TIME_WAIT).


On Mon, Mar 15, 2010 at 2:53 PM, Nicholas Frechette <zeno490@REDACTED>wrote:

> Hi,
> I recently started running 2 erlang applications in distributed mode (with
> -sname) on the same box.
> I am noticing now (doing netstat -t) that a _LOT_ of ports are left open at
> 4369 (the port used by epmd) on my ubuntu 9.10 box.
>
> In fact, of all active connections, 90%+ of my open ports will be
> localhost:4369 -> culpritbox:randomport.
> All are stuck in TIME_WAIT
>
> Any idea what could be causing this? I use a different computer to do my
> development and I see a similar pattern emerging (again ubuntu 9.10).
>
> My erlang version is R13B01.
>
> Here is an example output of `netstat -t` (note that even with -p, netstat
> doesn't display a program name for those ports). Any ideas?
>
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> tcp        0      0 mercury:4369            mercury:49448
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:45420
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:41234
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:35179
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:44567
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:45846
> TIME_WAIT
> tcp        0      0 localhost:4369          localhost:33424
> ESTABLISHED
> tcp        0      0 mercury:4369            mercury:38486
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:38624
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:44724
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:44398
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:47189
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:45306
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:36997
> TIME_WAIT
> tcp        0      0 localhost:48762         localhost:4369
> ESTABLISHED
> tcp        0      0 mercury:4369            mercury:38627
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:37665
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:48427
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:57916
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:51098
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:55867
> TIME_WAIT
> * something else
> * something else
> tcp        0      0 mercury:4369            mercury:36005
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:46053
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:35974
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:42211
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:33363
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:53662
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:37094
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:43824
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:51092
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:43258
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:43064
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:37111
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:54677
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:44286
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:49718
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:46809
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:46112
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:48825
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:44124
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:45203
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:51149
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:46636
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:48254
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:49424
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:59976
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:46730
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:44890
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:39385
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:57297
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:37066
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:50186
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:45703
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:42943
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:55328
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:44401
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:45791
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:56537
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:42194
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:33216
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:46544
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:47610
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:52892
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:38877
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:50983
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:45376
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:54394
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:45412
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:36546
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:32776
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:38289
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:35126
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:50964
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:47857
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:55772
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:41209
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:41426
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:52887
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:33961
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:58946
> TIME_WAIT
> tcp        0      0 localhost:33424         localhost:4369
> ESTABLISHED
> tcp        0      0 mercury:4369            mercury:46272
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:58219
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:60676
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:37091
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:34972
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:53706
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:52788
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:53221
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:57241
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:56398
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:40434
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:43636
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:41792
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:53162
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:41266
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:36990
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:37871
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:40089
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:58028
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:40347
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:55445
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:56130
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:37858
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:53709
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:45924
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:56969
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:33933
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:51305
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:53452
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:35840
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:49678
> TIME_WAIT
> * something else
> tcp        0      0 mercury:4369            mercury:57573
> TIME_WAIT
> tcp        0      0 localhost:4369          localhost:48762
> ESTABLISHED
> tcp        0      0 mercury:4369            mercury:46680
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:41095
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:44073
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:43461
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:39410
> TIME_WAIT
> tcp        0      0 mercury:4369            mercury:38881
> TIME_WAIT
>
>


More information about the erlang-bugs mailing list