Packages in Erlang: new documentation

Tue Sep 9 01:38:10 CEST 2003

A   Concerning hundreds of files in a directory:

    Who says Erlang modules, in any form, have to be in a directory?
    In the MVC and CMS world, it used to be normal to use "partitioned
    data sets" (think of UNIX ar(1), arguably better engineered).
    For example, instead of
	FOO.C	FOO.H
	BAR.C	BAR.H
    you might have
	CSOURCE(FOO)	HEADER(FOO)
	CSOURCE(BAR)	HEADER(BAR)
    where CSOURCE and HEADER were each a single file, containing subfiles.
    (For another analogy, think of nested OLE containers.)

    With block sizes slowly climbing upwards, the space waste of having
    one file per "object" can be quite high.  For example, I recently
    tried to install a certain (non-Erlang) system on a Mac with a 4GB
    disc where the system block size was (I am not making this up) 60kB.
    A one line file still took a full 60kB, and the disc space ran out
    with less than about 10% of the space actually _used_.

    For another example, an old version of Erlang I have lying around
    requires 203 million bytes in 13.5 thousand files.  On a file system
    with 8kB blocks, it actually takes 267 million bytes, wasting
    64MB, adding an extra 31% of waste space to useful information.

    In one set of files, I've measured the waste space : useful data ratio
    as 1.6 : 1, that's about 60% waste space.

    UNIX ar(1) format isn't very wonderful, because
    (1) it only allows 16 characters for a file name (man -s 3HEAD ar IKYN)
    (2) the directory information is scattered through the file, so it is
        inefficient to read just one member.
    But who says we have to use ar(1) format, eh?

B.  Concerning search paths:

    Search paths can be a real pain.
    Suppose there are two directories:
	/usr/ucb/bin/ ... cc, pr, ...
	/usr/bin/     ... cc, pr, ...
    and I want the cc from /usr/bin and the pr from /usr/ucb/bin.
    Then there is _no_ ordering of the directories in the search path
    that will give me what I want.

    Throw in over a dozen different installed programs each with its own
    idea of what it expects to be in the path, and you get some idea of
    why it has been a couple of years since I've been able to use Texinfo.

    There's a particular pain with search paths.  Here is a summary of
    just *half* the directories in my $PATH:
    297     /home/users/okeefe_r/commands.d
    116     /opt/SUNWspro/bin
     49     /usr/ccs/bin
    615     /usr/bin
    376     /usr/sbin
    263     /usr/local/bin
    105     /usr/ucb
    142     /usr/openwin/bin
    137     /usr/dt/bin

    The commands in ~/commands.d/ I am responsible for; I know what they
    do.  But of the 2100 total commands, I haven't the faintest idea of
    what more than about 600 do:  new ones keep on being added, there
    are replicates and clashes between these, and I have by now no reason
    to expect what I have to be a consistent set.  Quite often I have a
    directory with more than a hundred commands in my search path just
    for the sake of two or three commands.  (These days, I prefer adding
    a symbolic link from ~/commands.d to adding a whole directory.)

    If it is somewhere between "nightmarishly difficult" and "impossible"
    to manage a search path with a few thousand UNIX commands, it has to
    be worse managing lots of Erlang modules.

    With modules, you have the problem that a flotilla of modules may
    need access to each other, while only one or two of them should be
    referred to from elsewhere.  Putting them in a global namespace,
    *any* global namespace, simple or dotted, doesn't solve this problem.

C.  On packages providing one entry in a load path:

    This is also an advantage of LACE, an advantage which is _easily_
    gained WITHOUT INTRODUCING DOTTED MODULE NAMES.

The thing that doesn't scale is the global namespace for modules.
Packages postpone the collapse, but do not prevent it; they are
*still* a global namespace for modules.  They just plain don't go
to the root of the problem.

Dotted name systems are a dime a dozen.  The thing that was really
innovative in Java packages was tying them to an existing system of
names which has
    * LEGAL OWNERSHIP of names, and a
    * LEGAL REGISTRY of owned names.
It's not names like java.awt.event that are the interesting bit,
but names like com.icl.saxon.tree, which is tied to the legally
owned domain name "icl.com".