[erlang-questions] unix domain sockets with abstract namespace: can't use all 108 bytes

Fri Apr 28 06:46:17 CEST 2017

> On 27/04/2017, at 5:51 PM, Kenneth Lakin <kennethlakin@REDACTED> wrote:
> 
> NB: The observations of a Linux weenie (that is, me) follow:
> 
> On 04/26/2017 05:46 PM, Richard A. O'Keefe wrote:
>> Solaris documentation is clear that you should use
>> bind(s, (struct sockaddr *) &addr,
>> 		strlen(addr.sun_path) + sizeof (addr.sun_family))
> 
> Which is _really_ confusing to me.
> 
> Even on Solaris, sockaddr_un.sun_path is an array of fixed size, right?

So?

It shouldn't be that confusing.
The third parameter of bind() is the length of the *address*,
not the size of the *struct*.
> And for INET and INET6 sockets,

For IPv4 and IPv6 addresses, the length of the address and the
size of the struct happen to be the same number.
For Unix domain address, the length of the address (that is,
the number of bytes in the struct that are actual information-
bearing payload) is LESS than the size of the struct.

> it seem that the Solaris documentation
> says that argument three to bind is the length of the _entire struct_.

It really doesn't help that there are two socket interfaces
in Solaris.  bind(3SOCKET) calls the 3rd argument "namelen"
(note "len", not "sz"), doesn't really say what the arguments
are, but notes
    Binding a name in the UNIX domain creates a socket in the
    file system that must be deleted by the caller when it is
    no longer needed by using unlink(2).

    The rules used in name binding vary between communication domains.

bind(3XNET) calls the third argument "address_len" (which is
more accurate, because the length includes some bytes before
the name starts; note that it is still "len" not "sz")
No, that's not what it says.  First, the manpage I'm looking at,
second the Programming Interfaces Guide is careful to say
"length" (not "size") and says that it "Specifies the length of"
the address, not the size.

Neither manual page includes an example.
https://docs.oracle.com/cd/E26502_01/html/E35299/sockets-18552.html#sockets-6
has an example, and the example uses a sockaddr_in6, and the
example uses sizeof because for a sockaddr_in6 the size and length
coincide (there is nothing NUL-terminated).

It's
https://docs.oracle.com/cd/E26502_01/html/E35299/portmapper-51908.html#scrolltoc
that discusses UNIX-domain sockets (appendix A of the Sockets guide).
And it is really pretty explicit:
<quote>
bind (s, name, namelen);
The socket handle is s.
The bound name is a byte string that is interpreted by the supporting protocols.
UNIX family names contain a path name and a family.
The example shows binding the name /tmp/foo to a UNIX family socket. 

#include <sys/un.h>
 ...
struct sockaddr_un addr;
 ...
strlcpy(addr.sun_path, "/tmp/foo", sizeof(addr.sun_path));
addr.sun_family = AF_UNIX;
bind(s, (struct sockaddr *) &addr,
        strlen(addr.sun_path) + sizeof (addr.sun_family));

When determining the size of an AF_UNIX socket address,
null bytes are not counted, which is why you can use strlen(3C).
</quote>

For what it is worth, I have /usr/include/sys/un.h open in an
OpenBSD window, and I read therein

    /* actual length of an initialized sockaddr_un */
    #define SUN_LEN(su) \
            (sizeof(*(su)) - sizeof((su)->sun_path) + strlen((su)->sun_path))

and in Mac OS X 10.11.6, we find

    #if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE)
    /* actual length of an initialized sockaddr_un */
    #define SUN_LEN(su) \
            (sizeof(*(su)) - sizeof((su)->sun_path) + strlen((su)->sun_path))
    #endif  /* (!_POSIX_C_SOURCE || _DARWIN_C_SOURCE) */

So it seems pretty clear that the right way to determine the
*LENGTH* of a socket address
 (a) may and does differ between address families
 (b) is to use the sizeof the whole record for IPv4 and IPv6
     -- I'm actually wondering whether any attention is paid
     -- to it at all in those cases, since the sun_family field
     -- determines everything else
 (c) involves computing the length of the string for UNIX domain sockets
 (d) looks different between Solaris and *BSD because Solaris doesn't
     *have* a sun_len field, so

#define AF_UNIX_LENGTH(su) \
    sizeof su - sizeof su.sun_path + strlen(su.sun_path)

or perhaps
    offsetof(struct sockaddr_un, sun_path) + strlen(su.sun_path)

> Why does it make sense to force client code to special-case UNIX domain
> sockets and get strlen involved in calculating bind's third argument?

The client *already* has to special-case UNIX domain sockets:
you don't have to unlink() an IPv4 or IPv6 address when you've finished
with it.

> Why not have the kernel switch on sa_family_t and have any required
> special-casing neatly hidden from the client? [0]

You are asking me to read the minds of people who decided that
Unix domain socket names should be UNIX-style path names that actually
identified objects in the file system *BUT* the array that holds the
name should not be big enough for even the longest legal *component*
of a file name?

Perhaps they intended that
(a) the size of the struct should be a tolerable default but
(b) people should be able to get away with using something like
    the struct hack to make *bigger* records if they really needed
    them, in which case
(c) the 3rd parameter might be *bigger* than the sizeof the official
    struct and still work.

From long experience with other operating systems before BSD got a
socket interface, I find the entire idea of *binary* IPv4 and IPv6
addresses rather disgusting and above all things like storing the
binary data in *network* order rather then native order?  FEH!
The interface should in my view have been
    bind(socket_fd, AF_WHATEVER, "<appropriate string goes here>/<port>")
and the Linux extension should have been announced by a new AF_LINUX.

In short, I'm not saying the interface is REASONABLE,
I'm saying it is what it is.

> 
> My search skills might be weak, but I'm unable to find the section in
> the POSIX documentation that fully addresses UNIX domain sockets. I
> think the most it says on the topic is in the documentation for
> sys/un.h, which pretty much says "sockaddr_un must have sun_family and
> sun_path members. sun_path might be any size at all. Cast this struct to
> struct_sockaddr when you use it.". [1] What did I miss?

How much are you paying me for this?
> 
> If we move from thinking of sockaddr_un.sun_path as a null-terminated
> string

Then we move away from what *IS* documented into the realm of wishful thinking.

To repeat:
   what Solaris, Darwin, and OpenBSD all actually say boils down to
   <size of prefix> + <strlen of path>
and the AIX documentation also says to use SUN_LEN.

I don't have an HP-UX system to try things on, but chapter 6 of
http://www.cs.put.poznan.pl/wswitala/download/pdf/B2355-90136.pdf
- says that sun_path is 92 bytes long
- says to use "size of struct sockaddr_un".
which disagrees with the others.  Disagreement between existing systems
is probably the best explanation of vagueness in POSIX.  Having looked
at the OpenSolaris kernel, I reported earlier that it actually checks
within the size you give it and stops early if it finds a NUL, so I
*suspect* that the shorter size will actually work in HP-UX.

Ad fontes!

An Advanced 4.3BSD Interprocess Communication Tutorial
by Samuel J. Leffler, Robert S. Fabry, William N. Joy, Phil Lapsley (UCB)
Steve Miller, and Chris Torek (UM)
<quote>
If one wanted to bind the name ‘‘/tmp/foo’’ to a UNIX domain socket,
the following code would be used:

#include <sys/un.h>
...
struct sockaddr_un addr;
...
strcpy(addr.sun_path, "/tmp/foo");
addr.sun_family = AF_UNIX;
bind(s, (struct sockaddr *) &addr, strlen(addr.sun_path) +
        sizeof (addr.sun_family));

(I had a 4.2 BSD documentation set as recently as 2 years ago but had
to discard it when the University took back a wing of our building,
so this is as early as I can readily go.)

> , and move to thinking of it as a fixed-length sequence of bytes
> that might happen to be a null-terminated string,

The designers have always said
"The bound name is a VARIABLE LENGTH BYTE STRING
 which is interpreted by the supporting protocol(s)."

> But might also be
> interpreted in any other way by the underlying OS, does Linux's behavior
> seem more sensible? I mean, if we're no longer constrained by the
> restriction that sun_path _must_ be something that's a valid filename,
> [2] we can get very creative with the interpretation of sun_path, no?

I do not understand why we are no longer constrained by that
restriction.  An *operating system* (such as Linux) may turn an
otherwise illegal argument (an empty path) into something else.
But if we want code to be portable, *we* can't unilaterally abandon
all compatibility restraints.
> 
> From a practical perspective, it seems to me that as long as you zero
> out the entirety of the sockaddr_un before you start filling it and
> ensure that whatever you copy into sun_path is not longer than sun_path,
> then your code _should_ just work.

I believe I said in a previous message that I *believed* any length
between the actual length and the size of the struct would work in
Solaris -- in UNIX systems where sockaddr_un has an sun_len field
that field must be omitted from the size calculation -- would work
and *suspected* that work in other Unices.  But I really have other
things to do with my time (like writing exam questions, it's that
time of year here) than fossick around in kernels.  One is enough.

Your code "_should_ just work" if you do what the available documentation
says to do".  In this case, the HP-UX documentation says that would be OK;
Solaris, BSD, Darwin, AIX, and POSIX do not.  Maybe it will, maybe it won't,
maybe it will today but not tomorrow.
> 
> And, mostly unrelated to all that:
> 
>> It does not say the *size* of the structure but the
>> *length* of the structure.
> 
> To be fair, the documentation seems to consistently uses "length of
> [the] structure" where one might expect to see "size". See the
> documentation for connect, for example. ;)

That's not a different issue, that's the SAME issue.  It is talking
about the *length* (not size) of a socket address, and in Solaris,
*BSD, Darwin, &c it really should be the length, not the size.

> 
> [0] And while we're talking about inconsistent interfaces, why do the
> BSDs have a one-byte length as the first member of all(?) of its socket
> structs? This wart appears to be absent in both Solaris and Linux.

It was a change between 4.3BSD and 4.4BSD.  Presumably the answer can
still be found in a revision history in a dusty archive, possibly
orbiting Barnard's Star.

> [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_un.h.html
> [2] Because Linux's abstract namespace UNIX sockets are (just like IP
> sockets) AFAICT _not_ represented as files in the filesystem.

To the best of my knowledge, that's the *point* of them.
If you want to use a path name as a UNIX domain address, you have
to ensure that it is a valid path name that you have permission to
create (but does not exist before you create the first socket with
it) and that you must unlink when you have finished with it.  Given
that, the "special casing" required to get the length right is a
minor issue, don't you think?

My argument goes like this:
 - Linux is too important to ignore
 - The Linux extension is a genuinely useful feature
   done in a rather unpleasant way
 - If you are using AF_UNIX in the portable way, you should
   provide the *length* of the address, not the *size* of the
   struct.  I remember it being that way in 4.2 BSD and have
   cited material from some of the key BSD people that that was
   certainly the intent in 4.3BSD.
 - But if you care about HP-UX, you had better do some experiments.

Honestly, just doing the right thing is less effort than arguing
about it.