[erlang-questions] unix domain sockets with abstract namespace: can't use all 108 bytes

Thu Apr 27 15:53:54 CEST 2017

On Wed, Apr 26, 2017 at 10:51:38PM -0700, Kenneth Lakin wrote:
> NB: The observations of a Linux weenie (that is, me) follow:
> 
> On 04/26/2017 05:46 PM, Richard A. O'Keefe wrote:
> > Solaris documentation is clear that you should use
> > bind(s, (struct sockaddr *) &addr,
> > 		strlen(addr.sun_path) + sizeof (addr.sun_family))
> 
> Which is _really_ confusing to me.
> 
> Even on Solaris, sockaddr_un.sun_path is an array of fixed size, right?
> And for INET and INET6 sockets, it seem that the Solaris documentation
> says that argument three to bind is the length of the _entire struct_.

Well, maybe.
But AF_INET and AF_INET6 socket addresses are not variable length.

That it is an array of fixed size only means that you have a pre-allocated
area whithin which the address shall be stored.  The alternative would be to
have a pointer but then we get the problem of having to allocate and free
the memory for the address correctly.

This is an example of an variable length address.  In fact Linux does not
always use all 108 bytes.  You can create a socket with an empty address
and then you get an address with a leading '\0' and some 6 characters not
corresponding to a filesystem name.

> 
> Why does it make sense to force client code to special-case UNIX domain
> sockets and get strlen involved in calculating bind's third argument?
> Why not have the kernel switch on sa_family_t and have any required
> special-casing neatly hidden from the client? [0]

Since you for AF_UNIX have to know that the address corresponds to a
filesystem path name and that you have to explicitly use unlink(2)
to remove the file that bind(3) creates you already are forced to
specialize client code.

> 
> >> So it seems it is not a violation of Posix to use a su->sun_path that starts
> >> with a 0 combined with an address_len of sizeof(*su).
> > 
> > It may not be, but the path is still "" and thus a run-time error.
> > ...
> > POSIX systems *shouldn't* do this, but Linux *does*.
> 
> My search skills might be weak, but I'm unable to find the section in
> the POSIX documentation that fully addresses UNIX domain sockets. I
> think the most it says on the topic is in the documentation for
> sys/un.h, which pretty much says "sockaddr_un must have sun_family and
> sun_path members. sun_path might be any size at all. Cast this struct to
> struct_sockaddr when you use it.". [1] What did I miss?

There is an example in the Posix documentation of bind() that says sun_path
is a path name and that the empty string should result in ENOENT.

> 
> If we move from thinking of sockaddr_un.sun_path as a null-terminated
> string, and move to thinking of it as a fixed-length sequence of bytes
> that might happen to be a null-terminated string, but might also be
> interpreted in any other way by the underlying OS, does Linux's behavior
> seem more sensible? I mean, if we're no longer constrained by the
> restriction that sun_path _must_ be something that's a valid filename,
> [2] we can get very creative with the interpretation of sun_path, no?

Yes.  Creative.  But it would have been more kosher if Linux had defined a
new address family.  They have violated old standards in a creative way.

> 
> From a practical perspective, it seems to me that as long as you zero
> out the entirety of the sockaddr_un before you start filling it and
> ensure that whatever you copy into sun_path is not longer than sun_path,
> then your code _should_ just work.

Yes.  But what happens if you do that on FreeBSD?  Probably nothing
since sun_path is zero terminated within the size of the structure.
But unix(4) tells you to use SUN_LEN(su).  So there is no way to fulfill
both Linux's and FreeBSD's API definitions.  And so far nobody has
provided a way to configure test for it.  Thank you Linus!

I am inclined to just ignore the FreeBSD man page advice since it _may_
allow SUN_LEN(su) =< su->sun_len =< sizeof(*su)...

> 
> And, mostly unrelated to all that:
> 
> > It does not say the *size* of the structure but the
> > *length* of the structure.
> 
> To be fair, the documentation seems to consistently uses "length of
> [the] structure" where one might expect to see "size". See the
> documentation for connect, for example. ;)
> 
> [0] And while we're talking about inconsistent interfaces, why do the
> BSDs have a one-byte length as the first member of all(?) of its socket
> structs? This wart appears to be absent in both Solaris and Linux.

This simplifies handling variable length addresses for example for routing
sockets. [3]  It was added with 4.3BSD-Reno (1990) and probably existed in
SunOS 4 but did not propagate to Solaris.

> [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html
> [2] Because Linux's abstract namespace UNIX sockets are (just like IP
> sockets) AFAICT _not_ represented as files in the filesystem.

In blatant contradiction to Posix.

> 
> 

[3]: W. Richard Stevens: Unix Network Programming Volume 1 3:rd Edition
     Chapter 3.2 Socket Address Structures.

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB