new syntax - a provocation

Thu Sep 25 11:12:44 CEST 2003

On Thu, 25 Sep 2003, Richard A. O'Keefe wrote:

> Joe Armstrong <joe@REDACTED> discussed two ways of implementing
> atoms without a (permanent) atom table.
> 
> I don't understand what Mod and Index are.

Mod is a "Module number" ie a small integer 1..ModMax  which indexes
a table of know module names. ModMax would not have to be that large.

Index is an index into the atom table for an individual module, again
a small integer.

> 
> The question is "what do we want a global atom table _for_?"
> 
> Mainly, so that atom equal/not-equal testing can be O(1).
> Now, suppose an atom is a tagged pointer to a
>     [hash,size,data]
> block.  You only compare the data if the hash and size are the same.
> Most Erlang atoms are fairly short, so while this would be slower than
> whatever it is now, it wouldn't get _that_ much slower.

Yes - my proposals were to speed up this time - it may be that
comparing atoms in different modules
is done so infrequently and is sufficiently fast that
optimisations are not necessary. Comparing atoms in the same module
is atomic (since they are just indices not pointers)

> The hash value would be used for indexing in clause/case selection.
> 
> Or suppose that each module has a fixed set of atoms, known at
> compile time, plus each process has an atom table.  Atoms which the
> process creates dynamically can be looked up in the process atom table.
> Atoms that are sent to another process get looked up in the other
> process's atom table when they arrive.  

Yes.

> (The aim here is not to ensure
> uniqueness, but to keep the amount of space used for atom names bounded.)
> 
> I'm sure that there are lots of implementation alternatives to explore.

  Yes -  I spent about 10 years  thinking about "how to  garb the atom
table" just to realise that I'd been thinking about the wrong problem.

  The problem statement should have  been "How to design a system that
does not need an atom table".

  Garbing atom tables is *difficult* - a redesign that does not need an
atom table is much easier - I came up with two or three schemes for this
after a couple of days thought.

  All of these schemes are slightly slower than the global atom table
approach - but I don't think they will be *lot* slower.

 IMHO the benefits would outweigh the costs.

  For example, for  dynamic XML manipulations I dare  not represent an
XML parse tree as.

 {AtomTag, [Attr], [Child]} 

 But have to use the horrible form

 {String, [Attr], [Child]}

 For fear that one day the atom table will explode :-)

 The Atom representation is much better and faster - so the additional
time taken using non-atom-table methods might be regained by allowing
more natural algorithms to express things like XML parse trees.

 /Joe