new syntax - a provocation
Richard A. O'Keefe
ok@REDACTED
Wed Sep 24 03:29:09 CEST 2003
Joe Armstrong <joe@REDACTED> wrote:
- Proper structs
<< junk the record stuff. thus has been discussed in previous
mails >>
I never thought I'd recommend anything from IBM Prolog, but it turns out
that "IBM PROLOG for 370" had something that LIFE had and that I've long
wanted in Erlang. The things I want have been known by various names in
various languages: "tables" in SNOBOL, "Dictionaries" in Smalltalk,
"associative arrays" in AWK, "hashes" in Perl, "psi-terms" in LIFE,
"feature structures" in linguistics, and who knows what elsewhere.
In this message, I'm going to call them dictionaries.
A dictionary is a finite collection of (key,value) pairs such that
all keys are atoms and no two keys are equal. The values may be any
Erlang terms.
Functions:
is_dictionary(Dict) -> Boolean
if Dict is a dictionary, the result is 'true', otherwise 'false'.
In a guard, succeeds or fails instead of returning true or false.
Cost: O(1).
dictionary_to_list(Dict) -> [{K1,V1},....,{Kn,Vn}]
where the keys are in strictly ascending order
Cost: O(n).
list_to_dictionary([{K1,V1},...,{Kn,Vn}]) -> Dict
The argument must be a well formed list; each of its elements
must be a pair; and the first element of each pair must be an
atom. The atoms need not be distinct. If two pairs have the
same key, the first (leftmost) value is used and the later
pair is ignored.
Cost: O(n.lgn).
dictionary_size(Dict) -> Integer
The result is the same as length(dictionary_to_list(Dict))
would have been, but this form should be usable in guards.
Cost: O(1).
dictionary_has(Dict, Key) -> Boolean
The result is the same as
lists:keymember(Key, 1, dictionary_to_list(Dict))
would have been. In a guard, this succeeds or fails instead
of returning true or false.
Cost: O(lg n).
dictionary_value(Dict, Key) -> Value | error exit
The result is the same as
({value,{_,X}} = lists:keysearch(Dict, 1, Key), X)
would have been, except of course for not binding X and for
a more informative error report. In a guard expression, this
makes the guard fail instead of signalling an error.
Cost: O(lg n).
dictionary_search(Dict, Key) -> {value,Value} | false
The result is the same as
case lists:keysearch(Dict, 1, Key) of
{value,{_,X}} -> {value,X};
false -> false
end,
woudl have been, except of course for not binding X and for
a more informative error report. As this may create a new tuple,
it is not allowed in a guard.
Cost: O(lg n).
dictionary_sum(Dict1, Dict2) -> Dict
The result is the same as
list_to_dictionary(lists:keymerge(1,
dictionary_to_list(Dict1),
dictionary_to_list(Dict2)))
would have been, except for more informative error reports.
As the code above implies, if a key is present in both dictionaries,
the value in Dict1 is used.
Cost: O(|Dict1| + |Dict2|)
dictionary_keys(Dict) -> [Key]
The result is the same as
[K || {K,_} <- dictionary_to_list(Dict)]
would have been, except for more informative error reports.
In particular, the list of keys is sorted.
Cost: O(|Dict|).
dictionary_without(Dict, Keys) -> Dict'
The result is the same as
list_to_dictionary(
[{K,V} || {K,V} <- dictionary_to_list(Dict),
not lists:member(K, Keys)])
would have been, except for more informative error reports.
In particular, the list of keys is sorted.
Cost: O(|Dict| + k.lg k) where k = length(Keys).
Equality and ordering:
We'd like these to be compatible with set inclusion.
A dictionary is not equal to anything that isn't a dictionary.
Two dictionaries D1, D2 are equal if and only if
dictionary_to_list(D1) is equal to dictionary_to_list(D2).
For ordering, dictionaries go after tuples.
To order two dictionaries,
1. First order them by size.
2. If they have the same size, they have the same relative order
that dictionary_to_list(D1), dictionary_to_list(D2) would have.
Under these rules, dictionary_without(D, Keys) <= D,
with equality if and only if Keys == [].
Special syntax:
'<{' [maplet {',' maplet}*] ['|' expression] '}>'
is an expression, where
maplet ::= expression '->' expression
<{ K1->V1,...,Kn->Vn | D }>
is equivalent to
list_to_dictionary([{K1,V1},...,{Kn,Vn}|dictionary_to_list(D)])
except, as usual, for error messages, and for a compile-time
opportunity. If the keys K1...Kn are atoms at compile time and
there is no |D part, the cost is O(n), plus of course the cost
of evaluating the V1...Vn.
Comprehension:
There isn't any need for a special dictionary comprehension form,
because list_to_dictionary([{K,V} || ... ]) can be recognised by a
compiler. While I would _like_ to recommend a special form for
iterating over the keys and values of a dictionary in a list
comprehension, once again
K <- dictionary_keys(D)
and
{K,V} <- dictionary_to_list(D)
can be recognised by a compiler, so no special form is needed.
Pattern matching:
'<{' {pmaplet {',' pmaplist}*] ['|' variable] '}>'
is a pattern, where
pmaplet ::= atom '->' pattern
and all the atoms appearing as keys in a dictionary pattern must
be different. If there is a |variable part, the variable may not
appear elsewhere in the same pattern or the pattern's related guard.
(This condition is easily met if the variable is an anonymous one.)
<{ k1->P1, ..., kn->Pn }> = Dict
is equivalent to
D = Dict,
is_dictionary(D),
dictionary_size(D) == n, % note ==
P1 = dictionary_value(D, k1),
...,
Pn = dictionary_value(D, kn)
where D is a new variable.
<{ k1->P1, ..., kn->Pn | Var }> = Dict
is equivalent to
D = Dict,
is_dictionary(D),
dictionary_size(D) >= n, % note >=
P1 = dictionary_value(D, k1),
...,
Pn = dictionary_value(D, kn),
Var = dictionary_without(D, [k1,...,kn])
where D is a new variable. If the variable is an anonymous variable,
the call to dictionary_without/2 is omitted. If the variable is not
an anonymous variable, the call to dicitonary_without/2 (which is by
now known to be safe) is postponed till _after_ the guard.
The cost of matching a dictionary pattern with n variables against
a dictionary value with m keys is O(n+m) plus the cost of matching
the element patterns. In fact we can do better:
O(min(n+m, n.lg m)). We expect that <{timeout->T|_}> = D will take
O(lg |D|) time, the same as T = dictionary_value(D, timeout) (and in
fact a little faster, because the ocmpiler must verify that timeout
is an atom, while dictionary_value/2 would check at run time).
An obvious question is "why not use the 'dict' module?"
Answers:
(1) When we know that something is a dictionary, we can use a more
space-efficient representation. For example, instead of
[{K1,V1},...,{Kn,Vn}] -- (n list cells + n 2-tuples)
we can use
{dict,K1,V1,...,Kn,Vn} -- one 2n+1-tuple (not really a tuple)
which is likely to take somewhat less than half the space.
(2) When we know that something is a dictionary, we can use a more
time-efficient representation. For example, if the keys K1...Kn
as shown above are in standard order, lookups can be done in
O(lg n) time instead of O(n) time.
(3) When the representation is not visible to the programmer, the run-time
system may use several different representations. For example, here
are some obvious ones: linear array as shown above, some kind of
binary search tree (so that unchanged chunks can be shared), some kind
of hash table (so that lookup can be O(1)).
(4) There are lots of "functional array"/"persistent array" implementations
in the literature which could be used to make setelement/3 take O(1)
time; these could be adapted to make updates to dictionaries fast too,
provided the representation is not exposed.
(5) With the 'dict' module you get a somewhat less than obvious textual
representation, which could mean other things. With a built in data
type you can get "just right" textual representation.
(6) With 'dict' you don't get to do dictionary stuff in patterns or guards.
All of this could be said of other data structures, but this particular
one is of proven utility in many programming languages and of fairly wide
occurrence (as "list of options") in Erlang itself. It could replace many
uses of records, e.g.,
<{ mode->M, uid->U, gid->G | _}> = posix:stat(FileName),
and it can be used to provide keyword parameters for functions.
- Increase the "isolation" properties of processes
<< I want better inter-process protection. One process should not
be able to crash another though a side effect, like
flooding the process with messages.>>
Hmm. I had always assumed that "message sending is unreliable, in that
messages might not arrive" meant that if a process's mailbox was filling
up too fast, new messages could simply be discarded. Is that not so?
So far I'm doing B) - but the B approach involves a virtual machine
which has specialized instruction for handling closures,
list comprehension iterators etc - the nice language features
that we all want to have can be either rather easy or very difficult
to implement at this level.
I would be very interested in seeing any design notes Joe might have,
even scrawls on paper napkins.
Increase the isolation property of processes.
This has to be right.
My very first experience of threads in a high level language was also
my introduction to Xerox Lisp Machines (back in 1984):
- I brought up a file browser. I saw something that looked interesting,
but didn't want to lose my context, so
- I brought up another file browser. The system locked up almost
at once.
- The next N minutes were spent rebooting the system...
Big lesson: you *can't* rely on the programmer to remember all the locking;
you either need language support for automatically locking the right things
or a language where you don't need any locks in the first place.
More information about the erlang-questions
mailing list