Packages in Erlang: new documentation
Richard A. O'Keefe
Tue Sep 9 01:38:10 CEST 2003
A Concerning hundreds of files in a directory:
Who says Erlang modules, in any form, have to be in a directory?
In the MVC and CMS world, it used to be normal to use "partitioned
data sets" (think of UNIX ar(1), arguably better engineered).
For example, instead of
you might have
where CSOURCE and HEADER were each a single file, containing subfiles.
(For another analogy, think of nested OLE containers.)
With block sizes slowly climbing upwards, the space waste of having
one file per "object" can be quite high. For example, I recently
tried to install a certain (non-Erlang) system on a Mac with a 4GB
disc where the system block size was (I am not making this up) 60kB.
A one line file still took a full 60kB, and the disc space ran out
with less than about 10% of the space actually _used_.
For another example, an old version of Erlang I have lying around
requires 203 million bytes in 13.5 thousand files. On a file system
with 8kB blocks, it actually takes 267 million bytes, wasting
64MB, adding an extra 31% of waste space to useful information.
In one set of files, I've measured the waste space : useful data ratio
as 1.6 : 1, that's about 60% waste space.
UNIX ar(1) format isn't very wonderful, because
(1) it only allows 16 characters for a file name (man -s 3HEAD ar IKYN)
(2) the directory information is scattered through the file, so it is
inefficient to read just one member.
But who says we have to use ar(1) format, eh?
B. Concerning search paths:
Search paths can be a real pain.
Suppose there are two directories:
/usr/ucb/bin/ ... cc, pr, ...
/usr/bin/ ... cc, pr, ...
and I want the cc from /usr/bin and the pr from /usr/ucb/bin.
Then there is _no_ ordering of the directories in the search path
that will give me what I want.
Throw in over a dozen different installed programs each with its own
idea of what it expects to be in the path, and you get some idea of
why it has been a couple of years since I've been able to use Texinfo.
There's a particular pain with search paths. Here is a summary of
just *half* the directories in my $PATH:
The commands in ~/commands.d/ I am responsible for; I know what they
do. But of the 2100 total commands, I haven't the faintest idea of
what more than about 600 do: new ones keep on being added, there
are replicates and clashes between these, and I have by now no reason
to expect what I have to be a consistent set. Quite often I have a
directory with more than a hundred commands in my search path just
for the sake of two or three commands. (These days, I prefer adding
a symbolic link from ~/commands.d to adding a whole directory.)
If it is somewhere between "nightmarishly difficult" and "impossible"
to manage a search path with a few thousand UNIX commands, it has to
be worse managing lots of Erlang modules.
With modules, you have the problem that a flotilla of modules may
need access to each other, while only one or two of them should be
referred to from elsewhere. Putting them in a global namespace,
*any* global namespace, simple or dotted, doesn't solve this problem.
C. On packages providing one entry in a load path:
This is also an advantage of LACE, an advantage which is _easily_
gained WITHOUT INTRODUCING DOTTED MODULE NAMES.
The thing that doesn't scale is the global namespace for modules.
Packages postpone the collapse, but do not prevent it; they are
*still* a global namespace for modules. They just plain don't go
to the root of the problem.
Dotted name systems are a dime a dozen. The thing that was really
innovative in Java packages was tying them to an existing system of
names which has
* LEGAL OWNERSHIP of names, and a
* LEGAL REGISTRY of owned names.
It's not names like java.awt.event that are the interesting bit,
but names like com.icl.saxon.tree, which is tied to the legally
owned domain name "icl.com".
More information about the erlang-questions