new syntax - a provocation

Wed Sep 24 03:29:09 CEST 2003

Joe Armstrong <joe@REDACTED> wrote:
	  	- Proper structs

		<< junk the record stuff. thus has been discussed in previous
		   mails >>

I never thought I'd recommend anything from IBM Prolog, but it turns out
that "IBM PROLOG for 370" had something that LIFE had and that I've long
wanted in Erlang.  The things I want have been known by various names in
various languages:  "tables" in SNOBOL, "Dictionaries" in Smalltalk,
"associative arrays" in AWK, "hashes" in Perl, "psi-terms" in LIFE,
"feature structures" in linguistics, and who knows what elsewhere.

In this message, I'm going to call them dictionaries.

A dictionary is a finite collection of (key,value) pairs such that
all keys are atoms and no two keys are equal.  The values may be any
Erlang terms.

Functions:

    is_dictionary(Dict) -> Boolean
	if Dict is a dictionary, the result is 'true', otherwise 'false'.
	In a guard, succeeds or fails instead of returning true or false.
	Cost: O(1).

    dictionary_to_list(Dict) -> [{K1,V1},....,{Kn,Vn}]
	where the keys are in strictly ascending order
	Cost: O(n).

    list_to_dictionary([{K1,V1},...,{Kn,Vn}]) -> Dict
	The argument must be a well formed list; each of its elements
	must be a pair; and the first element of each pair must be an
	atom.  The atoms need not be distinct.  If two pairs have the
	same key, the first (leftmost) value is used and the later
	pair is ignored.
	Cost: O(n.lgn).

    dictionary_size(Dict) -> Integer
	The result is the same as length(dictionary_to_list(Dict))
	would have been, but this form should be usable in guards.
	Cost: O(1).

    dictionary_has(Dict, Key) -> Boolean
	The result is the same as
	lists:keymember(Key, 1, dictionary_to_list(Dict))
	would have been.  In a guard, this succeeds or fails instead
	of returning true or false.
	Cost: O(lg n).

    dictionary_value(Dict, Key) -> Value | error exit
        The result is the same as
        ({value,{_,X}} = lists:keysearch(Dict, 1, Key), X)
	would have been, except of course for not binding X and for
	a more informative error report.  In a guard expression, this
	makes the guard fail instead of signalling an error.
	Cost: O(lg n).

    dictionary_search(Dict, Key) -> {value,Value} | false
	The result is the same as
	case lists:keysearch(Dict, 1, Key) of
	    {value,{_,X}} -> {value,X};
	    false         -> false
	end,
	woudl have been, except of course for not binding X and for
	a more informative error report.  As this may create a new tuple,
	it is not allowed in a guard.
	Cost: O(lg n).

    dictionary_sum(Dict1, Dict2) -> Dict
	The result is the same as
	list_to_dictionary(lists:keymerge(1,
	    dictionary_to_list(Dict1),
	    dictionary_to_list(Dict2)))
	would have been, except for more informative error reports.
	As the code above implies, if a key is present in both dictionaries,
	the value in Dict1 is used.
	Cost: O(|Dict1| + |Dict2|)

    dictionary_keys(Dict) -> [Key]
	The result is the same as
	[K || {K,_} <- dictionary_to_list(Dict)]
	would have been, except for more informative error reports.
	In particular, the list of keys is sorted.
	Cost: O(|Dict|).

    dictionary_without(Dict, Keys) -> Dict'
	The result is the same as
	list_to_dictionary(
	    [{K,V} || {K,V} <- dictionary_to_list(Dict),
	                       not lists:member(K, Keys)])
	would have been, except for more informative error reports.
	In particular, the list of keys is sorted.
	Cost: O(|Dict| + k.lg k) where k = length(Keys).

Equality and ordering:

    We'd like these to be compatible with set inclusion.

    A dictionary is not equal to anything that isn't a dictionary.
    Two dictionaries D1, D2 are equal if and only if
    dictionary_to_list(D1) is equal to dictionary_to_list(D2).

    For ordering, dictionaries go after tuples.
    To order two dictionaries,
    1.  First order them by size.
    2.  If they have the same size, they have the same relative order
        that dictionary_to_list(D1), dictionary_to_list(D2) would have.

    Under these rules, dictionary_without(D, Keys) <= D,
    with equality if and only if Keys == [].

Special syntax:

    '<{' [maplet {',' maplet}*] ['|' expression] '}>'

    is an expression, where
    maplet ::= expression '->' expression

    <{ K1->V1,...,Kn->Vn | D }>

    is equivalent to
    list_to_dictionary([{K1,V1},...,{Kn,Vn}|dictionary_to_list(D)])
    except, as usual, for error messages, and for a compile-time
    opportunity.  If the keys K1...Kn are atoms at compile time and
    there is no |D part, the cost is O(n), plus of course the cost
    of evaluating the V1...Vn.

Comprehension:

    There isn't any need for a special dictionary comprehension form,
    because list_to_dictionary([{K,V} || ... ]) can be recognised by a
    compiler.  While I would _like_ to recommend a special form for
    iterating over the keys and values of a dictionary in a list
    comprehension, once again
	K <- dictionary_keys(D)
    and
	{K,V} <- dictionary_to_list(D)
    can be recognised by a compiler, so no special form is needed.

Pattern matching:

    '<{' {pmaplet {',' pmaplist}*] ['|' variable] '}>'

    is a pattern, where
    pmaplet ::= atom '->' pattern
    and all the atoms appearing as keys in a dictionary pattern must
    be different.  If there is a |variable part, the variable may not
    appear elsewhere in the same pattern or the pattern's related guard.
    (This condition is easily met if the variable is an anonymous one.)

    <{ k1->P1, ..., kn->Pn }> = Dict

    is equivalent to
	D = Dict,
	is_dictionary(D),
	dictionary_size(D) == n,   % note ==
	P1 = dictionary_value(D, k1),
	...,
	Pn = dictionary_value(D, kn)
    where D is a new variable.

    <{ k1->P1, ..., kn->Pn | Var }> = Dict

    is equivalent to
	D = Dict,
	is_dictionary(D),
	dictionary_size(D) >= n,   % note >=
	P1 = dictionary_value(D, k1),
	...,
	Pn = dictionary_value(D, kn),
	Var = dictionary_without(D, [k1,...,kn])
    where D is a new variable.  If the variable is an anonymous variable,
    the call to dictionary_without/2 is omitted.  If the variable is not
    an anonymous variable, the call to dicitonary_without/2 (which is by
    now known to be safe) is postponed till _after_ the guard.

    The cost of matching a dictionary pattern with n variables against
    a dictionary value with m keys is O(n+m) plus the cost of matching
    the element patterns.  In fact we can do better:
    O(min(n+m, n.lg m)).  We expect that <{timeout->T|_}> = D will take
    O(lg |D|) time, the same as T = dictionary_value(D, timeout) (and in
    fact a little faster, because the ocmpiler must verify that timeout
    is an atom, while dictionary_value/2 would check at run time).

An obvious question is "why not use the 'dict' module?"

Answers:
(1) When we know that something is a dictionary, we can use a more
    space-efficient representation.  For example, instead of
    [{K1,V1},...,{Kn,Vn}] 	-- (n list cells + n 2-tuples)
    we can use
    {dict,K1,V1,...,Kn,Vn}	-- one 2n+1-tuple (not really a tuple)
    which is likely to take somewhat less than half the space.

(2) When we know that something is a dictionary, we can use a more
    time-efficient representation.  For example, if the keys K1...Kn
    as shown above are in standard order, lookups can be done in
    O(lg n) time instead of O(n) time.

(3) When the representation is not visible to the programmer, the run-time
    system may use several different representations.  For example, here
    are some obvious ones:  linear array as shown above, some kind of
    binary search tree (so that unchanged chunks can be shared), some kind
    of hash table (so that lookup can be O(1)).

(4) There are lots of "functional array"/"persistent array" implementations
    in the literature which could be used to make setelement/3 take O(1)
    time; these could be adapted to make updates to dictionaries fast too,
    provided the representation is not exposed.

(5) With the 'dict' module you get a somewhat less than obvious textual
    representation, which could mean other things.  With a built in data
    type you can get "just right" textual representation.

(6) With 'dict' you don't get to do dictionary stuff in patterns or guards.

All of this could be said of other data structures, but this particular
one is of proven utility in many programming languages and of fairly wide
occurrence (as "list of options") in Erlang itself.  It could replace many
uses of records, e.g.,

    <{ mode->M, uid->U, gid->G | _}> = posix:stat(FileName),

and it can be used to provide keyword parameters for functions.

		- Increase the "isolation" properties of processes

		<< I want better inter-process protection. One process should not
		   be able to crash another though a side effect, like
		   flooding the process with messages.>>

Hmm.  I had always assumed that "message sending is unreliable, in that
messages might not arrive" meant that if a process's mailbox was filling
up too fast, new messages could simply be discarded.  Is that not so?

	    So far I'm doing B) - but the B approach involves a virtual machine
	    which has specialized instruction for handling closures,
	    list comprehension iterators etc - the nice language features
	    that we all want to have can be either rather easy or very difficult
	    to implement at this level.

I would be very interested in seeing any design notes Joe might have,
even scrawls on paper napkins.

	    Increase the isolation property of processes.

This has to be right.

My very first experience of threads in a high level language was also
my introduction to Xerox Lisp Machines (back in 1984):

    - I brought up a file browser.  I saw something that looked interesting,
      but didn't want to lose my context, so
    - I brought up another file browser.  The system locked up almost
      at once.
    - The next N minutes were spent rebooting the system...

Big lesson:  you *can't* rely on the programmer to remember all the locking;
you either need language support for automatically locking the right things
or a language where you don't need any locks in the first place.