SO_REUSEPORT and gen_tcp

Per Hedeland per@REDACTED
Mon May 22 15:31:16 CEST 2006


Andrew Lentvorski <bsder@REDACTED> wrote:
>
>Well, I will take your answer as simply that nobody actually has asked 
>for it.

I think another reason may be that SO_REUSEPORT is an OS-specific
option, i.e. it's not generally available even on Unicen, and the Erlang
interfaces try to stay away from such things.

>I was more interested if there was some sort of problem with this kind 
>of overloading of multiply bound sockets to the same address.  I see no 
>reason for there to be (everything looks normal since the OS handles 
>dispatching everything and it all looks like normal sockets at the API 
>level), but I wanted to check before just blindly adding the option to 
>the Erlang source code.

Having the option available surely can't be a problem, using it is
another thing... You generally *want* the restriction that is present
even with SO_REUSEADDR, i.e. that you can't bind two "listen" sockets to
the same local address and port. If you don't have that restriction, you
can inadvertently end up with say two different HTTP server processes
both accepting connections on port 80, and undefined semantics as far as
the TCP/IP stack selecting one or the other for incoming connections
goes (might be "last one to bind() wins", or "connections alternate
between the two", neither of which is likely to be useful).

>SO_REUSEADDR primarily allows servers to be restarted even before the 
>TCP timeouts close a previous socket.  This is primarily useful when 
>servers crash.
>
>SO_REUSEPORT actually allows completely duplicate socket bindings for 
>the same local IP address/port combination.  This is not the case for 
>SO_REUSEADDR.

Right - which means that the SO_REUSEPORT name is either a mistake or a
joke:-) - the difference isn't in address vs port (the "local address"
in the "standard" documentation of SO_REUSEADDR implies both local IP
address and local port), but in "local" (for SO_REUSEADDR) vs "complete"
or "both local and remote" (for SO_REUSEPORT).

>  It was supposed to be used for multicast, but has turned 
>out to be useful for getting past NATs.
>
>The advantage here is that the same local IP address/port can be used for:
>
>1) a connection to a rendezvous server to discover the external mapping
>2) an initiated connection to the other peer to open a hole in the NAT
>3) the bind() so that an incoming connection on that port can connect.

Hm, well, yes, however I think the authors of the paper you referred to
earlier are a bit confused. They write: "BSD systems have introduced a
SO_REUSEPORT option that controls port reuse separately from address
reuse", as if they think that on other systems, SO_REUSEADDR includes
what SO_REUSEPORT does on BSD - this isn't the case of course, an
SO_REUSEADDR that did that would simply be broken, see above.

So unless they want to suggest that "TCP hole punching" can only be done
on systems that implement SO_REUSEPORT or something equivalent - and it
sure doesn't seem like they want to suggest that - you shouldn't
actually need SO_REUSEPORT.

I haven't tried any of this, but I suspect that the trick is in the
details of how you do 2) and 3) above. With only SO_REUSEADDR, you can't
just bind() both to the same local address+port since at that point the
sockets would be completely equivalent, and the second bind() would be
rejected. But if you make sure that you not just do the bind() but also
initiate the connect() in 2) before you do the bind() in 3), the remote
address/port would be different, and there would not be a conflict that
wouldn't be allowed by SO_REUSEADDR.

My guess is that someone implementing this scheme didn't take this
ordering requirement into consideration (the paper certainly doesn't
spell it out) and failed, and then found out about SO_REUSEPORT, that
allowed it to work regardless of the order in which things were done.
And after that someone jumped to the wrong conclusions about
SO_REUSEPORT, both about its semantics vs those of SO_REUSEADDR and
about the need to use it.

I could be wrong though...:-)

--Per Hedeland



More information about the erlang-questions mailing list