epmd

Per Hedeland per@REDACTED
Fri May 26 17:27:32 CEST 2000


not.for.email@REDACTED (Gordon Beaton) wrote:
>On 26 May 2000 13:23:52 GMT, Richard Barrett wrote:
>> Whenever I start an erlang process using erl -name ..., another copy 
>> of epmd is also created. This is regardless of whether an instance of 
>> epmd is already running as a a result of my having run it explicitly 
>> from the command line or previously started another instance of erl. 
>> I've checked using netstat and lsof and all these instances of epmd 
>> are trying to listen on port 4369.
>
>This should not be possible!

Good point.:-)

> Epmd will exit at start if its attempt to
>bind the socket to port 4369 fails. And if another process (such as an
>earlier epmd) has already bound to the same port, then the attempt
>*will* fail, unless something is seriously wrong with your kernel.

Indeed, there seems to be something seriously wrong with Richard's
(Unix/Linux) kernel. I've actually seen this bug, i.e. the kernel
letting a new process bind to a port that another was already bound to,
a couple of times - it has exactly the weird symptoms Richard describes,
which are of course a disaster for a process that is supposed to manage
the Erlang-node-namespace on a host.

At one point the Erlang startup code had a workaround for this bug,
checking by means of trying to communicate with an existing epmd before
attempting to start a new one - but it has its own problems and isn't
fool-proof anyway (race conditions etc), and removing it was probably
the correct decision.

>> I am running erlang installed from erlang_otp-R6B-4.i386.rpm under 
>> Suse 6.2 Linux.

I'm afraid you'll have to contact Suse about this bug, if no-one else
has come across it already they're bound to do soon - most standard Unix
daemons will be affected by it to some degree, though in most cases not
as severely as epmd. You can try starting some number of those (e.g.
sendmail, as an example of a TCP-based server that normally runs
standalone rather than from inetd), and see if you don't get the same
effect (don't use the startup scripts though, run the binary directly).

Alternatively you could of course work around the bug by e.g. removing
the automatic startup of epmd (in erts/etc/common/erlexec.c I believe),
and making sure you manually start only one instance on each host -
probably not a very good idea, though.

--Per Hedeland
per@REDACTED

PS In case anyone's interested, the cases where I saw the bug was a)
when using some third-party SunOS 4 kernel patches for multicast, and b)
in an early Solaris version - maybe 2.2.



More information about the erlang-questions mailing list