[erlang-questions] Beware of wildcards on Mac OS + NFS

José Valim jose.valim@REDACTED
Tue Jun 17 11:50:11 CEST 2014


There is a solution that can be used meanwhile which seems to work at least
back to R15.

filelib:wildcard/2 allows a module that gets information from the
filesystem to be passed as second argument. The module only needs to export
read_file_info/1 and list_dir/1, as defined here:

https://github.com/erlang/otp/blob/maint/lib/stdlib/src/filelib.erl#L490-L508

Given the module defined below:

-module(file_no_dot_match).
-export([read_file_info/1, list_dir/1]).

read_file_info(File) ->
  file:read_link_info(File).

list_dir(Dir) ->
  case file:list_dir(Dir) of
    {ok, Files} ->
      {ok, [File || File <- Files, hd(File) /= $.]};
    Other ->
      Other
  end.


You can now get filelib:wildcard/2 to ignore files starting with dot by
passing the module as argument:

filelib:wildcard(Pattern, file_no_dot_match).


I have run some tests locally and this approach seem to initially work
fine. *, ** and ? now all ignore dots at the beginning of the file name.





*José Valim*
www.plataformatec.com.br
Skype: jv.ptec
Founder and Lead Developer


On Tue, Jun 17, 2014 at 3:26 AM, Richard A. O'Keefe <ok@REDACTED>
wrote:

> While installing rebar today, I ran into a problem.
> The problem *happened* to bite me in rebar, but it
> is in no way a rebar-specific problem.  It's quite
> easy to patch around the problem in rebar, but it's
> worth discussing whether a more general fix is
> appropriate.
>
> Here's the symptom:
>
> m% ./bootstrap
> Recompile: src/._rebar
> src/._rebar.erl:1: unterminated string starting with
> "l\000\000\016â\000\000\000\230\000\000\000N\000\000\000"
> src/._rebar.erl:1: no module definition
> Failed to compile rebar files!
>
> Here's one of the two causes:
>
>     %% Compile all src/*.erl to ebin
>     case make:files(filelib:wildcard("src/*.erl"),
>
> The whole setup has
>  - no problems in Linux
>  - no problems in Solaris
>  - no problems in OpenBSD
>  - no problems in Mac OS X with its native file system
> BUT
>  - is broken in Mac OS X with files accessed over NFS.
>
> The basic issue is that Mac OS X still wants files to
> have "data forks" and "resource forks", which it uses
> for "extended attributes".  Like these:
>
> m% ls -l@ rebar-master.zip
> -rw-r--r--@ 1 ok  csstaff  247237 17 Jun 11:37 rebar-master.zip
>         com.apple.quarantine        78
>         com.apple.metadata:kMDItemWhereFroms       132
>         com.apple.metadata:kMDItemDownloadedDate            53
>
> This is actually quite handy; using the 'xattr' command I can
> discover not just that it was Safari that downloaded the file,
> but where it was downloaded from.
>
> It's also the case that when you unpack downloaded files,
> the extracted files get the same 'quarantine' information.
>
> It's not just downloading.
>
> This all works smoothly in the Mac's own file system, but when
> your home directory is held on a departmental file server and
> accessed through NFS, it's handled by giving each file xxx a
> a ._xxx shadow.  See for example http://support.grouplogic.com/?p=1496
>
> This is not a problem in shell scripts.
> m% cd rebar-master/src; ls *.erl
> will not show the AppleDouble dot-underscore files,
> because shell wild cards don't match leading dots.
>
> However, it appears that filelib:wildcard(...) wildcards
> DO match leading dots.
>
> m% mkdir FOO
> m% cd FOO
> m% touch .foo .barfood .foogol
> m% erl
> 1> filelib:wildcard("*").
> [".barfood",".foo",".foogol"]
> 2> filelib:wildcard("*.foo*").
> [".foo",".foogol"]
>
> This discrepancy between Erlang wildcards and UNIX wildcards
> is not a bug waiting to happen.  It is a bug that *has* happened.
> The documentation for filelib:wildcard/1
> http://www.erlang.org/doc/man/filelib.html#wildcard-1
> not only does not mention the problem, it goes out of its way
> to present examples that *WILL* go wrong in Mac OS X.
>
> From a user perspective, the simplest and by far the best change
> would be to make "*" wildcards act like their Unix analogues and
> NOT match leading dots, and the same with "?".
>
> This would immediately fix quite a few programs that are now
> broken (like all of the wildcard examples in the documentation).
>
> The least effort change would alter the documentation:
>
> wildcard(Wildcard) -> [file:filename()]
>
>   Types:
>
>     Wildcard = filename() | dirname()
>
>     The wildcard/1 function returns a list of all files that match
>     Unix-style wildcard-string Wildcard.
>
>     The wildcard string looks like an ordinary filename, except that
>     certain "wildcard characters" are interpreted in a special way.
>     The following characters are special:
>
>     ?
>       Matches one character.
>
>     *
>       Matches any number of characters up to the end of the
>       filename, the next dot, or the next slash.
> +++   BEWARE: in the Unix shells, a * wildcard will never
> +++   match a leading dot.  Thus the file name "._stuff.erl"
> +++   won't match "*.erl" in a shell.  But in this function
> +++   it WILL match.
>
>     ....
>
>     [Character1,Character2,...]
>                ^          ^
>
> *** compile_charset/2 does not in fact require commas or process
> *** them specially; remove the commas from the documentation.
>
> +++   Character classes in ``re'' regular expressions and csh(1)
> +++   command lines can be complemented using "^".  ksh(1) uses
> +++   "!" for complementing.  bash(1) allows both "^" and "!".
> +++   This function allows neither.  You cannot complement a
> +++   character class at all.
>
>     To find all .beam files in all applications, the
>     following line can be used:
>
>       filelib:wildcard("lib/*/ebin/*.beam").
>
> +++ BEWARE: if you are using Mac OS X and accessing files through
> +++ NFS the extended attributes of a snark.beam file will be held
> +++ in a ._snark.beam file, *which this pattern will match*.
>
> Then all the examples need fixing, but unfortunately it's rather
> hard to fix them.  The simplest patch to the examples would be
> to change e.g. "lib/*/ebin/*.beam" to "lib/*/ebin/[^.]*.beam".
>
> Unfortunately, filelib:wildcard/1 not only does not support
> complement character set patterns, it doesn't know (and the
> documentation doesn't say) that there is an issue.
>
> m% touch '^x.foo' '.y.foo'
> 4> filelib:wildcard("[^.]*.foo").
> [".y.foo","^x.foo"]
> 5> filelib:wildcard("[!.]*.foo").
> [".y.foo"]
>
> The absence of character class complementation means that there
> is currently NO easy way for Erlang programmers to write wild-card
> patterns that do the right thing, which is why I think that
> fixing the semantics is the best thing to do.
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140617/90c7a971/attachment.htm>


More information about the erlang-questions mailing list