Concatenating atoms

Tue Feb 1 12:18:14 CET 2005

> Thomas Lindgren wrote
> As an alternative to discouraging developers, I'd like
> to encourage the Erlang implementation community to,
> at long last, implement an atom GC :-) (Well, I really
> do.)
> 
> Best,
> Thomas

  Nja - Ummm - we're garbing the wrong thing - we should be garbing
the code space and not the atom space, atoms should be local to modules
and not global at all. There should not be a global atom table in the
first place - it violates the principle of isolation.

  The atom table is an efficiency lack which should never have be made.
With a little carefull re-design we could eliminate the atom table
and then no GC is required.

   This would mean that each module would have to have its own
private atom table. With a little thought (little = about 10 years :-)
we could arrange that:

	- atom comparisons within the same module is atomic
	- atom comparisons of atoms in two different modules 
	  is atomic the second time it is made
	  is a hash table lookup the first time it is made

   Atoms would be represented as

	(AtomTag, Pointer) -> (LocalHashTablePointer)  -> Value
				    (RemotehashTablePointer)

   ie each Atom (a tagged pointer) points to two words.

    The first is a pointer to the local module hash table
    The second is zero (initially) is used to cache a hint
    pointer (the hint points to an atom in a remote module
    which is known to be the same as the local atom) -
    when two or more modules use the same atom - the numerically
    lowest pointer should be used.

    This would need a few more changes:

	- we don't move code
	- we garbage collect code (ie not have two versions)
	- when code Is finally removed (by GC) then we sweep all
        code spaces zeroing any cached remote hash table pointers

   basically we should not garb the atom table - we should garb the code
space - and we should dynamically cache atom and function start addresses.

   The idea of having two versions of code is silly anyway - we should have
 N versions and garb away old versions. Atoms should not be global, but local to 
individual modules and cacheable hint pointers should be used to optimise
atom comparison and function start address resolution.

   Code should be first class - but probably represented by special frozen heap
objects since it is likely to hang around for a long time and moving it
would be expensive since we would have to invalidate the cached heap references

   Cheers

	/Joe