Dictionaries

Wed Sep 24 17:29:45 CEST 2003

Richard A. O'Keefe wrote:
> In this message, I'm going to call them dictionaries.
> 
> A dictionary is a finite collection of (key,value) pairs such that
> all keys are atoms and no two keys are equal.  The values may be any
> Erlang terms.

Sounds good :-)

I'd prefer to have keys that are more than atoms. Full Erlang terms 
would be most flexible, but in practice, you need a data structure that 
has a comparison operation - with that, you can use binary trees 
(hopefully with improvements that prevent degenerate cases).
Even better would be keys where the key value gives you an indication of 
how far into the general sort sequence it is (e.g. when looking up a 
telephone directory, if you know the name starts with "P", you don't do 
a bisection search, you start at roughly 60% - translated to searching 
algorithms, this makes B-Trees and their ilk feasible and useful).

In practice, I'd want the following types of key:
* Atoms (if the key space is limited)
* Strings (if the key space is unlimited - think DoS attacks...)
* Numbers (I have geographical data, I *need* this *g*)
   (unlimited-size integers would be fine though)

> Equality and ordering:
> 
>     We'd like these to be compatible with set inclusion.

Mixing set semantics with dictionaries is almost sure to give serious 
trouble.
The problem is that, depending on the situation, programmers will want 
to set dictionaries as different sets:
1) As the set of its keys.
2) As the set of its values.
3) As the set of its key-value pairs.

> An obvious question is "why not use the 'dict' module?"
> 
> Answers:
> (1) When we know that something is a dictionary, we can use a more
>     space-efficient representation.  For example, instead of
>     [{K1,V1},...,{Kn,Vn}] 	-- (n list cells + n 2-tuples)
>     we can use
>     {dict,K1,V1,...,Kn,Vn}	-- one 2n+1-tuple (not really a tuple)
>     which is likely to take somewhat less than half the space.

Such a space-efficient representation will be useful iff it is known 
that the dictionary will not change.
Hmm... yes, the proposal has no way of adding key-value pairs.

Which leads me to the question how to create a dictionary with an 
additional key-value pair? Such creation should not have an amortized 
worse than O(number-of-key-value-pairs * log |Dictionary|) efficiency to 
be useful.

> (2) When we know that something is a dictionary, we can use a more
>     time-efficient representation.  For example, if the keys K1...Kn
>     as shown above are in standard order, lookups can be done in
>     O(lg n) time instead of O(n) time.

Then the dict implementation could be improved. Simple tree-balancing 
algorithms that always maintain an amortized O(log |Dictionary|) update 
and retrieval time exist, even without destructive updates; see Chris 
Okasaki's "Purely Functional Data Structures".

Regards,
Jo