Dictionaries

Fri Sep 26 10:35:10 CEST 2003

Richard A. O'Keefe wrote:
> Joachim Durchholz <joachim.durchholz@REDACTED> wrote:
> 	In practice, I'd want the following types of key:
> 	* Atoms (if the key space is limited)
> 
> I don't quite get that.  You're not telling me Erlang _still_ has that
> daft 255-character limit on atom names, are you?

No, but atoms are not garbage collected.
The limit is large, but if you have large dictionaries with transient 
keys, the atom table may fill up.

> 	* Numbers (I have geographical data, I *need* this *g*)
> 	   (unlimited-size integers would be fine though)
> 	
> If you have geographical data, you need something better than binary
> search.  You probwably need some kind of R-tree, or at least quad trees.

Binary search is enough, thank you :-)
All you need to do is to interleave the bits of the coordinates. With 
that set-up, going down one level in the tree is equivalent to refining 
the query.
Oh, and I need a way to check whether there's any key within a given key 
range. Unless the dictionary is using hash tables, this is a simple 
operation.

> 	>     We'd like these to be compatible with set inclusion.
> 	
> 	Mixing set semantics with dictionaries is almost sure to give serious 
> 	trouble.
> 
> Why on earth should it?  Dictionaries *are* sets.  It's *failing* to
> consider set semantics in designing dictionary operations that would give
> serious trouble.
> 
> 	The problem is that, depending on the situation, programmers
> 	will want to set dictionaries as different sets:
> 	1) As the set of its keys.
> 	2) As the set of its values.
> 	3) As the set of its key-value pairs.
> 	
> The last one is the mathematical definition of what a dictionary _is_.
> You can't expect to please everyone, but you _can_ try to make sense.

Right - but any of the three views above can make sense. Which of them 
makes sense depends entirely on what the caller wants to do. (I have 
seen the need for all three views, so I /know/ what I'm speaking about, 
at least here if not everywhere else *g*.)

> One is is one that I've already mentioned, that of changing representations
> at run time.  When you build a new dictionary from an old, the change is
> done _lazily_.  What you get is at first just a description of the change;
> when the new dictionary is actually _used_, then you make the change.

Agreed.

> 	> (2) When we know that something is a dictionary, we can use a more
> 	>     time-efficient representation.  For example, if the keys K1...Kn
> 	>     as shown above are in standard order, lookups can be done in
> 	>     O(lg n) time instead of O(n) time.
> 	
> 	Then the dict implementation could be improved.  Simple
> 	tree-balancing algorithms that always maintain an amortized O(log
> 	|Dictionary|) update and retrieval time exist, even without destructive
> 	updates; see Chris Okasaki's "Purely Functional Data Structures".
> 
> I have seen it.  Several of the data structures from that book have even
> been translated to Erlang.
> 
> The big point is that "dictionary" does NOT imply "frequently changed
> dictionary" and it does NOT imply "large dictionary".  Things that are
> dictionaries are also useful when they are small and don't change.  We
> call them records.

Right - and in that case, we don't need a new dictionary type :-)

Regards,
Jo