Meyer, OO and concurrency

Mon Jul 18 03:43:34 CEST 2005

"Valentin Micic" <valentin@REDACTED> asked, quote reasonably:
	What are the limitations of [the process] dictionary,
	and *why* [has] it received a bad reputation?

The Erlang book itself says
    "The process dictionary should be used with extreme care"
and
    "their use leads to unclear programs and should be avoided
    wherever possible"
and
    "we do not wish to encourage its use",
which is hardly a recommendation.

let's look at the interface, as described in that book.

    put(Key, Value) ->
	Result = get(Key),
	%process_dictionary.store(Key, Value),
	Result.

    get(Key) when %process_dictionary.includes_key(Key) ->
	%process_dictionary.fetch(Key);
    get(_) -> undefined.

    get() ->
	[{K1,V1}, ..., {Kn,Vn}].
	/* returns a {Key,Value} pair for each key in the dictionary,
	   the order of the results is not specified.  In particular,
	   it is nowhere guaranteed that get() == get().
	*/

    get_keys(Value) ->
	[Key || {Key,Val} <- get(), Val == Value].

    erase(Key) when %process_dictionary.includes_key(Key) ->
	Result = %process_dictionary.fetch(Key),
	%process_dictionary.remove_key(Key),
	Result;
    erase(_) -> undefined.

    erase() ->
	Result <- get(),
	[erase(Key) || {Key,_} <- Result],
	Result.

To start with, there are some quirky aspects to this.
There is (or at any rate was) no easy way to tell the difference between
a key which has the value 'undefined' associated with it and a key which
has NO value associated with it.  There's a hard way, namely to call
get() and look in the result, but that's not pleasant.

Suppose someone wants to use 'fred' as a local variable in some calculation.
Being a careful programmer, our friend realises that something else in the
same process may also be trying to use 'fred', so our friend arranges to
put the original value back afterwards:

    Old_Fred = put(fred, New_Value),
    ...,
    put(fred, Old_Fred)

Now, if this codes is invoked when 'fred' *is* in use, all is well,
but if it is invoked when 'fred' is *not* in use, 'fred' will be
defined afterwards when it wasn't previously.  So our friend tries to
fix this:

    Old_Fred = put(fred, New_Value),
    ...
    if Old_Fred == undefined -> erase(fred)
     ; Old_Fred /= undefined -> put(fred, Old_Fred)
    end

which is fine until the day when the outer code has defined 'fred' to
be 'undefined'.  A fairly simple test of any design like this is
"Can you write a function

    with(Key, Value, Closure)

 which temporarily binds Key to Value then invokes Closure and on the
 way out (whether by normal return or by exception) restores Key to
 its original status in the dictionary?"

Thanks to 'catch', you _can_ ensure that a clean-up action will be
performed even if there is an exception.  But you cannot do the rest
of the job without having to pick up the *whole* process dictionary
(which could be very large) using get().

Another version of with() would be with([{K1,V1},...,{Kn,Vn}], Closure),
but it suffers the same problem:  you have to pick up the *whole*
process dictionary to discover which keys should be restored and which
erased.

Perhaps more interesting, the book provides NO PERFORMANCE GUARANTEES.
Nor can I find any discussion of performance in the on-line documentation
at www.erlang.se.  For all we are told to the contrary, the process
dictionary might use linear search or worse.  For example, in Prolog,
the 'recorded' data base only uses the principal functor of the key for
indexing, and if Erlang did the same, then using keys a..z would be
efficient but keys {a}..{z} would be inefficient.  Nor are we advised
how much copying gets done.  So we have *NO* idea how to use this
facility efficiently.

As far as I can tell from a quick scan of erl_process_dict.[ch],
process dictionaries are some kind of expanding hash table and hashing
depends on the whole key.  They should be pretty good, BUT the fact
remains that this is not actually promised anywhere that I can find,
and there are still issues about copying.

Finally, there's a fairly lethal objection from a software engineering
point of view.  In effect, Erlang with the process dictionary offers
us a language with two levels of variable scope:
    there is an outer global level of mutable variables
    each function has its own set of read-only variables
Now when you are debugging, you may be very interested in finding out
how a variable came to have a particular value.  For ordinary Erlang
variables, no worries.  For conventional programming languages, relatively
few worries.  For example, cscope(1) can find references to global variables
in C, and there is even a free slicing program for C that can track down
where something might have been set.  (And failing that, there are watchpoints
in debuggers like dbx.)  But in Erlang, the "names" of global variables
are the *values* of run-time expressions, and while it *may* be easy to find
assignments to a particular variable, in general you have to look at EVERY
place where you call put/2.  If there is anything in the debugger to set a
watchpoint on a process dictionary entry, I've missed it.

*With* a proper "effects" system (like FX-80), something very like the
process dictionary could be quite useful.  As it is, the process dictionary
interface is an invitation to (inadvertent) "spaghetti data".

There isn't even any run-time type distinction between the key and the
value; swap the arguments in a call to put/2 and you may never notice.
Note that this interacts with the lack of a simple way to tell whether
a key is defined (and the corresponding fact that get/1 never fails):
if you *meant* to write

    put(one, 2)
    ...
    get(one)+1

and by mistake wrote

    put(2, one)
    ...
    get(one)+1
instead, you are NOT told that 'one' isn't in the dictionary, you are
told that you have 'badarith'.

While I personally think that functional languages are better off without
assignment statements (unless 'tamed' by an effects system), you don't have
to agree with me about that to accept that the criticisms I have put forward
in this message show that the process dictionary, as currently defined, is
not as good an interface as it should have been, and is risky enough to
avoid unless you have a REALLY good reason to use it.