[erlang-questions] Beware of wildcards on Mac OS + NFS

Richard A. O'Keefe ok@REDACTED
Tue Jun 17 03:26:07 CEST 2014


While installing rebar today, I ran into a problem.
The problem *happened* to bite me in rebar, but it
is in no way a rebar-specific problem.  It's quite
easy to patch around the problem in rebar, but it's
worth discussing whether a more general fix is
appropriate.

Here's the symptom:

m% ./bootstrap
Recompile: src/._rebar
src/._rebar.erl:1: unterminated string starting with "l\000\000\016â\000\000\000\230\000\000\000N\000\000\000"
src/._rebar.erl:1: no module definition
Failed to compile rebar files!

Here's one of the two causes:

    %% Compile all src/*.erl to ebin
    case make:files(filelib:wildcard("src/*.erl"),

The whole setup has
 - no problems in Linux
 - no problems in Solaris
 - no problems in OpenBSD
 - no problems in Mac OS X with its native file system
BUT
 - is broken in Mac OS X with files accessed over NFS.

The basic issue is that Mac OS X still wants files to
have "data forks" and "resource forks", which it uses
for "extended attributes".  Like these:

m% ls -l@ rebar-master.zip
-rw-r--r--@ 1 ok  csstaff  247237 17 Jun 11:37 rebar-master.zip
        com.apple.quarantine        78 
        com.apple.metadata:kMDItemWhereFroms       132 
        com.apple.metadata:kMDItemDownloadedDate            53 

This is actually quite handy; using the 'xattr' command I can
discover not just that it was Safari that downloaded the file,
but where it was downloaded from.

It's also the case that when you unpack downloaded files,
the extracted files get the same 'quarantine' information.

It's not just downloading.  

This all works smoothly in the Mac's own file system, but when
your home directory is held on a departmental file server and
accessed through NFS, it's handled by giving each file xxx a
a ._xxx shadow.  See for example http://support.grouplogic.com/?p=1496

This is not a problem in shell scripts.
m% cd rebar-master/src; ls *.erl
will not show the AppleDouble dot-underscore files,
because shell wild cards don't match leading dots.

However, it appears that filelib:wildcard(...) wildcards
DO match leading dots.

m% mkdir FOO
m% cd FOO
m% touch .foo .barfood .foogol
m% erl
1> filelib:wildcard("*").
[".barfood",".foo",".foogol"]
2> filelib:wildcard("*.foo*").
[".foo",".foogol"]

This discrepancy between Erlang wildcards and UNIX wildcards
is not a bug waiting to happen.  It is a bug that *has* happened.
The documentation for filelib:wildcard/1
http://www.erlang.org/doc/man/filelib.html#wildcard-1
not only does not mention the problem, it goes out of its way
to present examples that *WILL* go wrong in Mac OS X.

From a user perspective, the simplest and by far the best change
would be to make "*" wildcards act like their Unix analogues and
NOT match leading dots, and the same with "?".

This would immediately fix quite a few programs that are now
broken (like all of the wildcard examples in the documentation).

The least effort change would alter the documentation:

wildcard(Wildcard) -> [file:filename()]

  Types:

    Wildcard = filename() | dirname()

    The wildcard/1 function returns a list of all files that match
    Unix-style wildcard-string Wildcard.

    The wildcard string looks like an ordinary filename, except that
    certain "wildcard characters" are interpreted in a special way.
    The following characters are special:

    ?
      Matches one character.

    *
      Matches any number of characters up to the end of the
      filename, the next dot, or the next slash.
+++   BEWARE: in the Unix shells, a * wildcard will never
+++   match a leading dot.  Thus the file name "._stuff.erl"
+++   won't match "*.erl" in a shell.  But in this function
+++   it WILL match. 

    ....

    [Character1,Character2,...]
               ^          ^

*** compile_charset/2 does not in fact require commas or process
*** them specially; remove the commas from the documentation.

+++   Character classes in ``re'' regular expressions and csh(1)
+++   command lines can be complemented using "^".  ksh(1) uses
+++   "!" for complementing.  bash(1) allows both "^" and "!".
+++   This function allows neither.  You cannot complement a
+++   character class at all.

    To find all .beam files in all applications, the
    following line can be used:

      filelib:wildcard("lib/*/ebin/*.beam").        

+++ BEWARE: if you are using Mac OS X and accessing files through
+++ NFS the extended attributes of a snark.beam file will be held
+++ in a ._snark.beam file, *which this pattern will match*.  

Then all the examples need fixing, but unfortunately it's rather
hard to fix them.  The simplest patch to the examples would be
to change e.g. "lib/*/ebin/*.beam" to "lib/*/ebin/[^.]*.beam".

Unfortunately, filelib:wildcard/1 not only does not support
complement character set patterns, it doesn't know (and the
documentation doesn't say) that there is an issue.

m% touch '^x.foo' '.y.foo'
4> filelib:wildcard("[^.]*.foo").
[".y.foo","^x.foo"]
5> filelib:wildcard("[!.]*.foo"). 
[".y.foo"]

The absence of character class complementation means that there
is currently NO easy way for Erlang programmers to write wild-card
patterns that do the right thing, which is why I think that
fixing the semantics is the best thing to do.






More information about the erlang-questions mailing list