epmd

Wed May 31 00:51:17 CEST 2000

Richard Barrett <R.Barrett@REDACTED> wrote:
>I raised the problem on this mailing list of multiple copies of epmd 
>being spawned when multiple copies erl were run, each of which only 
>knew about the instance of beam that spawned it. Further, all epmd 
>instances were successfully listening on port 4369.

Um, a new connection can only be delivered to one of them, and arguably
that's the only one that is "successfully" listening at that point in
time...

>I have investigated further and believe that all is working as 
>advertised with Suse Linux

It is not. I don't have a Suse Linux available, but on a RedHat 6.1
system (kernel 2.2.12-20smp), where I incidentally tried to reproduce
your problem but failed (i.e. SO_REUSEADDR worked correctly), the
socket(7) man page says:

       SO_REUSEADDR
              Indicates  that  the  rules  used   in   validating
              addresses  supplied  in a bind(2) call should allow
              reuse of local addresses. For PF_INET sockets  this
              means  that a socket may bind, except when there is
                                             ^^^^^^^^^^^^^^^^^^^^
              an active listening socket bound  to  the  address.
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
              When  the  listening  socket is bound to INADDR_ANY
              with a specific port then it  is  not  possible  to
              bind to this port for any local address.

- which is the standard semantics of SO_REUSEADDR. What the "reuse of
local addresses" means is that you're allowed to bind a socket to IP
address X, port Y, even though there is another socket already bound to
that address/port, *provided the remote address differs*. I.e. an
established connection {X,Y} <-> {P,Q} where {P,Q} is a remote
address/port, doesn't prevent a starting daemon from binding to {X,Y},
as the "remote address/port" is the "wildcard" {*,*}, i.e. different.

Without the SO_REUSEADDR such an established (or terminated but
"lingering") connection prevents the daemon from starting, thus it is
basically "standard operating procedure" for TCP-based daemons. It
should *not* allow two processes to have a socket with local
address/port {X,Y} and remote address/port {*,*}, anymore than it should
allow two identical established connections {X,Y} <-> {P,Q}.

>Around lines 135 to 146 of this source file, conditional compilation 
>of a setsocketopt call to enable the SO_REUSEADDR option occurs if 
>the OS being compiled under is not _WIN32_.

I'm afraid I can't comment on the WIN32 stuff, my guess would be that it
simply lacks the SO_REUSEADDR functionality, but of course it could be
broken there too (or done some other way).

>For reasons I have yet to determine, under Sun Solaris, an attempt to 
>bind to a given port by a second instance of my test scripts (see 
>below) does fail, even though the SO_REUSEADDR socket option has be 
>enabled.

Yes, modern versions of Solaris, like most all other Unix versions,
implement SO_REUSEADDR correctly.

All this being said, it might still be the case that Erlang should try
to avoid being bitten by a broken SO_REUSEADDR implementation. Klacke
wrote:

}1. The only reason (as far as I can see) to have SO_REUSEADDR on the
}epmd socket is while debugging epmd. That is while starting and stopping
}epmd several times. So the SO_REUSEADDR option should be removed on 
}the epmd daemon.

This might be reasonable - epmd doesn't fork on new connections (of
course), thus if one instance is already running, it will also take care
of any new connections, and there is no need to start a new daemon.
However if you've forcibly killed the running one for whatever reason,
and then immediately start up a new distributed Erlang system, you may
then end up with no epmd at all (i.e. distribution startup will fail),
due to a "lingering" connection. I don't know if this is acceptable - I
guess this needs some thinking about...

--Per Hedeland
per@REDACTED