Dictionaries

Fri Sep 26 05:57:54 CEST 2003

Joachim Durchholz <joachim.durchholz@REDACTED> wrote:
	I'd prefer to have keys that are more than atoms.

Why, so would I, but you have to start _somewhere_.  One important
issue is interoperability:  using atoms (isomorphic to strings) as
keys means interoperability with YAML is straightforward; it means
that mapping between some kinds of objects and Erlang is straightforward;
it's just exactly right for representing node attributes in XML; ...
Atoms are just _enough_ to get most of the benefits.

	In practice, I'd want the following types of key:
	* Atoms (if the key space is limited)

I don't quite get that.  You're not telling me Erlang _still_ has that
daft 255-character limit on atom names, are you?  Heck, it's nearly 20
years since Quintus made sure the atom limit was at least as big as the
host file name limit (1024 on typical UNIX systems).  SWI Prolog doesn't
have _any_ limit on the size of an atom.

	* Numbers (I have geographical data, I *need* this *g*)
	   (unlimited-size integers would be fine though)

If you have geographical data, you need something better than binary
search.  You probwably need some kind of R-tree, or at least quad trees.

	>     We'd like these to be compatible with set inclusion.

	Mixing set semantics with dictionaries is almost sure to give serious 
	trouble.

Why on earth should it?  Dictionaries *are* sets.  It's *failing* to
consider set semantics in designing dictionary operations that would give
serious trouble.

	The problem is that, depending on the situation, programmers
	will want to set dictionaries as different sets:
	1) As the set of its keys.
	2) As the set of its values.
	3) As the set of its key-value pairs.

The last one is the mathematical definition of what a dictionary _is_.
You can't expect to please everyone, but you _can_ try to make sense.

	> (1) When we know that something is a dictionary, we can use a more
	>     space-efficient representation.  For example, instead of
	>     [{K1,V1},...,{Kn,Vn}] 	-- (n list cells + n 2-tuples)
	>     we can use
	>     {dict,K1,V1,...,Kn,Vn}	-- one 2n+1-tuple (not really a tuple)
	>     which is likely to take somewhat less than half the space.

	Such a space-efficient representation will be useful iff it is known 
	that the dictionary will not change.

Um, this is *Erlang* we're talking about?  An Erlang dictionary would
not be able to change any more than the number 2 can change.  You can
compute new dictionaries from old ones.

	Hmm... yes, the proposal has no way of adding key-value pairs.

Yes it does.

    New_Dict = <{ foo->Bar | Old_Dict }>

adds a key (foo)-value (Bar) pair to Old_Dict, giving New_Dict.
It just doesn't smash Old_Dict as a side effect.

	Which leads me to the question how to create a dictionary with an 
	additional key-value pair?

Explained in detail in the previous message.

	Such creation should not have an amortized worse than
	O(number-of-key-value-pairs * log |Dictionary|) efficiency to be
	useful.

That isn't actually true; when you have enough key-value pairs to worry
about you can do it much faster than that if you _don't_ do it one at a
time, but yes, there are implementation techniques that do it.

One is is one that I've already mentioned, that of changing representations
at run time.  When you build a new dictionary from an old, the change is
done _lazily_.  What you get is at first just a description of the change;
when the new dictionary is actually _used_, then you make the change.

	> (2) When we know that something is a dictionary, we can use a more
	>     time-efficient representation.  For example, if the keys K1...Kn
	>     as shown above are in standard order, lookups can be done in
	>     O(lg n) time instead of O(n) time.

	Then the dict implementation could be improved.  Simple
	tree-balancing algorithms that always maintain an amortized O(log
	|Dictionary|) update and retrieval time exist, even without destructive
	updates; see Chris Okasaki's "Purely Functional Data Structures".

I have seen it.  Several of the data structures from that book have even
been translated to Erlang.

The big point is that "dictionary" does NOT imply "frequently changed
dictionary" and it does NOT imply "large dictionary".  Things that are
dictionaries are also useful when they are small and don't change.  We
call them records.