dictionaries (was Re: new syntax - a provocation)

Wed Sep 24 18:47:05 CEST 2003

On Wed, 24 Sep 2003 13:29:09 +1200 (NZST)
"Richard A. O'Keefe" <ok@REDACTED> wrote:

> Joe Armstrong <joe@REDACTED> wrote:
> 	  	- Proper structs
> 	
> 		<< junk the record stuff. thus has been discussed in previous
> 		   mails >>
> 	
> I never thought I'd recommend anything from IBM Prolog, but it turns
> out that "IBM PROLOG for 370" had something that LIFE had and that
> I've long wanted in Erlang.  The things I want have been known by
> various names in various languages:  "tables" in SNOBOL,
> "Dictionaries" in Smalltalk,"associative arrays" in AWK, "hashes" in
> Perl, "psi-terms" in LIFE,"feature structures" in linguistics, and who
> knows what elsewhere.

Every data structures textbook I've seen calls them "dictionaries" too,
and Erlang calls them "dicts", so I very much agree with your choice of
terminology.

> 
> In this message, I'm going to call them dictionaries.
> 
> A dictionary is a finite collection of (key,value) pairs such that
> all keys are atoms and no two keys are equal.  The values may be any
> Erlang terms.
> 
> Functions:
> [... API almost identical to the dict module's ...]
> [...]
> Special syntax:
> 
>     '<{' [maplet {',' maplet}*] ['|' expression] '}>'
> 
>     is an expression, where
>     maplet ::= expression '->' expression
> 
>     <{ K1->V1,...,Kn->Vn | D }>

I think I like this better than Joe's syntax, but I'm not sure.  Either
way it's a sure sign that the available syntactic markers are getting a
bit scant :)

> An obvious question is "why not use the 'dict' module?"

Indeed.

> Answers:
> (1) When we know that something is a dictionary, we can use a more
>     space-efficient representation.  For example, instead of
>     [{K1,V1},...,{Kn,Vn}] 	-- (n list cells + n 2-tuples)
>     we can use
>     {dict,K1,V1,...,Kn,Vn}	-- one 2n+1-tuple (not really a tuple)
>     which is likely to take somewhat less than half the space.
> 
> (2) When we know that something is a dictionary, we can use a more
>     time-efficient representation.  For example, if the keys K1...Kn
>     as shown above are in standard order, lookups can be done in
>     O(lg n) time instead of O(n) time.
> 
> (3) When the representation is not visible to the programmer, the
> run-time
>     system may use several different representations.  For example,
>     here are some obvious ones:  linear array as shown above, some
>     kind of binary search tree (so that unchanged chunks can be
>     shared), some kind of hash table (so that lookup can be O(1)).
>
> (4) There are lots of "functional array"/"persistent array"
> implementations
>     in the literature which could be used to make setelement/3 take
>     O(1) time; these could be adapted to make updates to dictionaries
>     fast too, provided the representation is not exposed.
> 
> (5) With the 'dict' module you get a somewhat less than obvious
> textual
>     representation, which could mean other things.  With a built in
>     data type you can get "just right" textual representation.

I don't follow _any_ of these arguments.  It's my understanding that the
representation of dicts is _undefined_.  (You didn't go peeking and
write code like 'case A of {dict,_,_,...} -> ...', did you?  :)

Because the representation of dicts is undefined, they could change
anytime in the future, perhaps drastically (e.g. becoming BIFs.)

> (6) With 'dict' you don't get to do dictionary stuff in patterns or
> guards.

All that's needed for that is a special _notation_ for dictionaries
(which, to be clear, is _not_ the same thing as a _representation_ in
this sense.)

I've said it before, but I think the path to take here, to maximize the
benefit for code that's already been written, is:

- implement dictionaries directly in BEAM (any number of available C
implementations would probably do the trick sufficiently - in fact we
already have a very efficient associative storage mechanism that might
work just fine.  what's it called again?  oh yah.  'ETS' :)

- map the dict module to BIFs (we don't need a new API for this - and
existing code that uses dicts gets performance/extensibility benefits)

- add a syntax for pattern-matching dictionaries (the only real advance)

> All of this could be said of other data structures, but this
> particular one is of proven utility in many programming languages and
> of fairly wide occurrence (as "list of options") in Erlang itself.  It
> could replace many uses of records, e.g.,
> 
>     <{ mode->M, uid->U, gid->G | _}> = posix:stat(FileName),
> 
> and it can be used to provide keyword parameters for functions.

Completely 100% agreed (and by Joe too I think, but he calls them
'structs' instead.)

On Wed, 24 Sep 2003 17:29:45 +0200
Joachim Durchholz <joachim.durchholz@REDACTED> wrote:

> Richard A. O'Keefe wrote:
> > In this message, I'm going to call them dictionaries.
> > 
> > A dictionary is a finite collection of (key,value) pairs such that
> > all keys are atoms and no two keys are equal.  The values may be any
> > Erlang terms.
> 
> Sounds good :-)
> 
> I'd prefer to have keys that are more than atoms.

Ditto.  No sense taking a step backwards.

> Full Erlang terms 
> would be most flexible, but in practice, you need a data structure
> that has a comparison operation

Why?  Dictionaries aren't ordered.

> In practice, I'd want the following types of key:
> * Atoms (if the key space is limited)
> * Strings (if the key space is unlimited - think DoS attacks...)
> * Numbers (I have geographical data, I *need* this *g*)
>    (unlimited-size integers would be fine though)

In practice, I'd want the same types of keys that are available these
days - arbitrary Erlang terms.

1> D=dict:new().
{dict,0, ... }
2> D1=dict:store({key, ["%#", 42, 7.01, self()]}, value, D).
{dict,1, ... }
3> dict:find({key,["%#",42,7.01000,list_to_pid("<0.27.0>")]}, D1).
{ok,value}

-Chris