dictionaries (was Re: new syntax - a provocation)
Chris Pressey
cpressey@REDACTED
Wed Sep 24 18:47:05 CEST 2003
On Wed, 24 Sep 2003 13:29:09 +1200 (NZST)
"Richard A. O'Keefe" <ok@REDACTED> wrote:
> Joe Armstrong <joe@REDACTED> wrote:
> - Proper structs
>
> << junk the record stuff. thus has been discussed in previous
> mails >>
>
> I never thought I'd recommend anything from IBM Prolog, but it turns
> out that "IBM PROLOG for 370" had something that LIFE had and that
> I've long wanted in Erlang. The things I want have been known by
> various names in various languages: "tables" in SNOBOL,
> "Dictionaries" in Smalltalk,"associative arrays" in AWK, "hashes" in
> Perl, "psi-terms" in LIFE,"feature structures" in linguistics, and who
> knows what elsewhere.
Every data structures textbook I've seen calls them "dictionaries" too,
and Erlang calls them "dicts", so I very much agree with your choice of
terminology.
>
> In this message, I'm going to call them dictionaries.
>
> A dictionary is a finite collection of (key,value) pairs such that
> all keys are atoms and no two keys are equal. The values may be any
> Erlang terms.
>
> Functions:
> [... API almost identical to the dict module's ...]
> [...]
> Special syntax:
>
> '<{' [maplet {',' maplet}*] ['|' expression] '}>'
>
> is an expression, where
> maplet ::= expression '->' expression
>
> <{ K1->V1,...,Kn->Vn | D }>
I think I like this better than Joe's syntax, but I'm not sure. Either
way it's a sure sign that the available syntactic markers are getting a
bit scant :)
> An obvious question is "why not use the 'dict' module?"
Indeed.
> Answers:
> (1) When we know that something is a dictionary, we can use a more
> space-efficient representation. For example, instead of
> [{K1,V1},...,{Kn,Vn}] -- (n list cells + n 2-tuples)
> we can use
> {dict,K1,V1,...,Kn,Vn} -- one 2n+1-tuple (not really a tuple)
> which is likely to take somewhat less than half the space.
>
> (2) When we know that something is a dictionary, we can use a more
> time-efficient representation. For example, if the keys K1...Kn
> as shown above are in standard order, lookups can be done in
> O(lg n) time instead of O(n) time.
>
> (3) When the representation is not visible to the programmer, the
> run-time
> system may use several different representations. For example,
> here are some obvious ones: linear array as shown above, some
> kind of binary search tree (so that unchanged chunks can be
> shared), some kind of hash table (so that lookup can be O(1)).
>
> (4) There are lots of "functional array"/"persistent array"
> implementations
> in the literature which could be used to make setelement/3 take
> O(1) time; these could be adapted to make updates to dictionaries
> fast too, provided the representation is not exposed.
>
> (5) With the 'dict' module you get a somewhat less than obvious
> textual
> representation, which could mean other things. With a built in
> data type you can get "just right" textual representation.
I don't follow _any_ of these arguments. It's my understanding that the
representation of dicts is _undefined_. (You didn't go peeking and
write code like 'case A of {dict,_,_,...} -> ...', did you? :)
Because the representation of dicts is undefined, they could change
anytime in the future, perhaps drastically (e.g. becoming BIFs.)
> (6) With 'dict' you don't get to do dictionary stuff in patterns or
> guards.
All that's needed for that is a special _notation_ for dictionaries
(which, to be clear, is _not_ the same thing as a _representation_ in
this sense.)
I've said it before, but I think the path to take here, to maximize the
benefit for code that's already been written, is:
- implement dictionaries directly in BEAM (any number of available C
implementations would probably do the trick sufficiently - in fact we
already have a very efficient associative storage mechanism that might
work just fine. what's it called again? oh yah. 'ETS' :)
- map the dict module to BIFs (we don't need a new API for this - and
existing code that uses dicts gets performance/extensibility benefits)
- add a syntax for pattern-matching dictionaries (the only real advance)
> All of this could be said of other data structures, but this
> particular one is of proven utility in many programming languages and
> of fairly wide occurrence (as "list of options") in Erlang itself. It
> could replace many uses of records, e.g.,
>
> <{ mode->M, uid->U, gid->G | _}> = posix:stat(FileName),
>
> and it can be used to provide keyword parameters for functions.
Completely 100% agreed (and by Joe too I think, but he calls them
'structs' instead.)
On Wed, 24 Sep 2003 17:29:45 +0200
Joachim Durchholz <joachim.durchholz@REDACTED> wrote:
> Richard A. O'Keefe wrote:
> > In this message, I'm going to call them dictionaries.
> >
> > A dictionary is a finite collection of (key,value) pairs such that
> > all keys are atoms and no two keys are equal. The values may be any
> > Erlang terms.
>
> Sounds good :-)
>
> I'd prefer to have keys that are more than atoms.
Ditto. No sense taking a step backwards.
> Full Erlang terms
> would be most flexible, but in practice, you need a data structure
> that has a comparison operation
Why? Dictionaries aren't ordered.
> In practice, I'd want the following types of key:
> * Atoms (if the key space is limited)
> * Strings (if the key space is unlimited - think DoS attacks...)
> * Numbers (I have geographical data, I *need* this *g*)
> (unlimited-size integers would be fine though)
In practice, I'd want the same types of keys that are available these
days - arbitrary Erlang terms.
1> D=dict:new().
{dict,0, ... }
2> D1=dict:store({key, ["%#", 42, 7.01, self()]}, value, D).
{dict,1, ... }
3> dict:find({key,["%#",42,7.01000,list_to_pid("<0.27.0>")]}, D1).
{ok,value}
-Chris
More information about the erlang-questions
mailing list