[erlang-questions] Unable to restart epmd, sockets stuck in close_wait
Billee Kelder
billeejo@REDACTED
Fri Aug 26 17:46:45 CEST 2016
Hi Folks,
I've got a system using erlang/OTP 18.3.4.1 and rabbitmq 3.6.3. Everything
is local to the system and there is no clustering.
We are seeing intermittent failures when
stopping-uninstalling-reinstalling-starting epmd.
When this happens we also see many sockets stuck in close_wait like so:
tcp 48 0 0.0.0.0:4369 0.0.0.0:*
LISTEN 0 570937 1/systemd
tcp 5 0 127.0.0.1:4369 127.0.0.1:37560
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:42564
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:53126
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:40222
CLOSE_WAIT 0 0 -
tcp 38 0 127.0.0.1:4369 127.0.0.1:33506
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:56332
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:50511
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:45528
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:59487
CLOSE_WAIT 0 0 -
tcp 4 0 127.0.0.1:4369 127.0.0.1:37506
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:41554
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:40080
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:32903
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:48851
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:35177
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:44931
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:54730
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:48311
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:39159
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:47166
CLOSE_WAIT 0 0 -
tcp 2 0 127.0.0.1:4369 127.0.0.1:37541
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:38290
CLOSE_WAIT 0 0 -
tcp 31 0 127.0.0.1:4369 127.0.0.1:43044
CLOSE_WAIT 0 0 -
tcp 2 0 127.0.0.1:4369 127.0.0.1:37540
CLOSE_WAIT 0 0 -
tcp 2 0 127.0.0.1:4369 127.0.0.1:37544
CLOSE_WAIT 0 0 -
On an identical working system the output looks like this:
tcp 0 0 0.0.0.0:4369 0.0.0.0:*
LISTEN 1/systemd
tcp 0 0 <ip address>:4369 9.47.80.245:36368
TIME_WAIT -
tcp 0 0 127.0.0.1:34836 127.0.0.1:4369
ESTABLISHED 22713/beam.smp
tcp 0 0 127.0.0.1:4369 127.0.0.1:34836
ESTABLISHED 21186/epmd
on the hung system:
epmd -names and epmd -kill both hang indefinitely
Attempting to restart epmd.socket or epmd.service gives the error
epmd.socket failed to listen on sockets: Address already in use
Is there any way to
a) Get more information about what is causing the state to occur (so I can
hopefully prevent it in the future)
or
b) Recover from this state (without rebooting the system)?
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160826/9c218bc2/attachment.htm>
More information about the erlang-questions
mailing list