new syntax - a provocation

Fri Sep 26 02:19:12 CEST 2003

Luke Gorrie <luke@REDACTED> wrote:
	Surely you considered the (undoubtedly unoriginal) idea that I posted
	in that time. I'd like to know what's wrong with it - does it badly
	conflict with the existing ERTS implementation somehow?

	To restate my idea in terms unlikely to be overlooked, I wrote a
	working and fully garbage-collectable Atom datatype in about a page of
	Java (below).

	Of course, maybe symbolic computing is just easier in Java.

To summarise that code:  "use a weak table".
Or rather, to summarise the summary:  "What's so hard about implementing
a garbage collected atom table?  Just use a garbage collected atom table
to implement it!"

Luke Gorrie is right, that _is_ unoriginal.  A garbage collected
atom table is precisely a weak table; if you already have weak tables
(or weak sets, or ...) then implementing a garbage collected atom table
is trivial.  If you don't, then implementing the one is pretty much
identical to implementing the other.

In fact this is so unoriginal that it goes back to SNOBOL 4 (and maybe
earlier).  SNOBOL was (well, I have a running SNOBOL system on my Solaris
boxes, so I suppose "is") a language for doing stuff with strings; it
would not surprise me if it was the inspiration for the regular expressions
used in UNIX.  (Not regular expressions in general; I specifically have
back-references in mind.)  Certainly a version of SNOBOL 3 used to be
shipped with UNIX (the program "sno").  SNOBOL programs normally created
_lots_ of strings, that's what they were for.  But SNOBOL kept all string
values in a hash table.  A hash table whose elements are retained only if
they are referred to elsewhere is precisely a weak table of strings and it
is precisely a garbage collected atom table.

This is old technology.  SWI Prolog has such a thing, and is multithreaded.
Java's String.unique() feature presumably uses such a thing.  Smalltalk has
an atom table, and has concurrent processes, and that's been around since
before 1980.  I have a Parlog system floating around; I _don't_ have the
source code, but I have enough of the .h files for it to be able to tell
that it has a global atom table and there are hints in the documentation
that it is garbage collected.

So we (= "computer scientists in general") know how to implement a global
garbage collected atom table in a (uni- or tightly coupled multi-)processor
concurrent programming language.

BUT we also know that as a shared mutable data structure, access to it
requires locking (which Java is just _full_ of; it has taken heroic
measures by Java implementors to get locking overheads down to tolerable
levels).  And we also know that because it is global, it requires some
_global_ form of garbage collection.  Both of these are Bad Things.

It's interesting to consider the question:  "what would an Erlang
implementation have to be like for us to be able to take a running
process and move it to another processor?"  (IBM's AIX was able to
do this on their mainframes, I believe.  I think Kali Scheme can do it.
And I think the University of Washington's Emerald could.)  Global
data structures (global atom atom, global process name registry,
global module table) don't really fit nicely into such a scheme.