Implementing tables - advice wanted

Joe Armstrong (AL/EAB) joe.armstrong@REDACTED
Tue Jun 13 10:32:34 CEST 2006


 
Hello,

   I've been thinking a bit about tables in Erlang.

   Erlang really really really (^100) needs tables.

   Tables are like records without the record declarations, or like
ets tables with a nicer syntactic structure - they are what I used to
call structs in earlier postings, but most other languages call them
tables.

   In a Daghstuhl meeting we has a session on the theme "all you need
are tables - virtually all dynamically typed languages (except Erlang)
have tables. In Lua they are called tables, in Python dictionaries, in
JavaScript associative arrays - the semantics of all these things are
slightly different but essentially they are the same.

   All of these are first-class hash tables - think of them as hash
tables on the heap.
   
   Of all the syntaxes and semantics Lua's attracts me most.

   Keys are strings or integers, conseqative ranges of integers
(very useful for arrays) are handled in a special manner.

   I'm wondering about how to implement tables.

   Let's choose "@" as the table constructor (just about the only
character
left, that's unused.

	X = @{name="fred", age=23, footSize = 42}
	Name = X.name,
	...

creates a table, etc. the syntax is "just like records only with #Tag
replaced by @.
   
   Now how could I implement this?

   Question how lightweight is ets?

   To try and answer this I wrote the following:

Let's implement this:

	X = @{name="fred", age=23, footSize = 42}
	Name = X.name,

By transforming into this:

test() ->
    X = ets:new(anonymnous, []),
    ets:insert(X, [{name,fred}, {age,23}, {footSize, 42}]),
    [{_,Name}] = ets:lookup(X, name),
    io:format("Info:~p~n",[ets:info(X)]),
    Name.

What I'm interested in is not the syntax - we can fix that later
but the size of the ets object created.

The info call returns

     Info:{{memory,310},
       {owner,<0.39.0>},
       {name,anonymnous},
       {size,3},
       {node,nonode@REDACTED},
       {named_table,false},
       {type,set},
       {keypos,1},
       {protection,protected}}

The memory is 310 is this reasonable? - Let's do a 
quick calculation.

My table has 3 keys, all atoms, and three values atoms or integers

So:

	Space for Keys and values  = 6 words
	Space for hash table       = 6 words  (say 2 x # entries)
	Headers                    = 4 words  (guess)
	Extra space per entry      = 3 x 3 = 9 words (guess)
	(this is for bucket 
	 overflows etc)

This gets me to 25 words - so it looks like the 310 words
is way too big.

I then made an incrementally larger table by adding a {sex,male}
entry - the new size was 321 words ie 11 words more for the additional
entry. 

Without looking at the C code I'd imagine that most of the overhead
is in the headers since etrs table are a lot more than simple hash
tables.

Suppose I want to implement tables.

Do I
	- reuse some of the ets code
	- take a look at the Lua and do like they do
	- roll my own

and if the answer is roll my own, what is the best approach?

Should I keep the distinction between array indices and other things
as indices in the table (a la Lua) or just put *everything* into a hash
table?

Should I use sorted lists, skip-lists, or binary searches instead of
hash tables?	  

I guess for "small" tables linear search is as good as a hash table -
but in this case how small is "small"?.

If I use hashing then what is the best algorithm? - I rather like
http://en.wikipedia.org/wiki/Cuckoo_hashing

Finally, should atoms in the table (either as keys or values) be
pointers
to a global hash table, or local. The "local" option here might sound
crazy but it would avoid problems with atom table GC.



More information about the erlang-questions mailing list