[erlang-questions] Process scope variable

Wed Feb 18 03:29:13 CET 2015

On 18/02/2015, at 2:05 pm, Imants Cekusins <imantc@REDACTED> wrote:

> > It has problems, but it can do the job, and the problems are a
> helpful reminder that maybe the job should not be done.
> 
> Could you go into more detail about the problems with the process dictionary, please?

I already did.
(1) It's mutable.

    You can sort of fix this by using wrapper functions:

    safe_put(Key, Val) ->
        undefined = get(Key),
        put(Key, Val).

(2) Keys are global, not per-module.

    If you call code from a module that you have not read,
    you cannot tell which keys it might use for its own purposes.

    You can sort of fix this by using module_name.variable_name keys.

    As it happens, the Erlang libraries DO use the process
    dictionary, most notably for holding the random generator state.
    The number of keys used is in the low hundreds, and *most* of
    them are safely hidden inside other processes, but I don't
    KNOW which are and which are not.

    If your process-scope variables are per-module, then a lot of
    the problem goes away.  (In particular, when I last looked,
    nothing in the Erlang libraries used M.V style keys.)  And
    then you have the problem of not being able to refer to a
    variable in another module when you want to, which guess what,
    *prevents* some useful refactorings.

(3) There is as far as I know no tool support to help with (2) or
    even with tracking key use within a single module.  Since ANY
    term may be used as a key, things like
        receive Key -> receive Val -> put(Key, Val) end end
    are possible, meaning that as far as I can tell, it isn't
    even theoretically possible to be complete and correct.  Some
    kind of approximate (ideally conservative) tool would be nice.

    This can be worked around by writing a trivial preprocessor.
    This one is in AWK to keep the message short.

    /^-[ \t]*module[(]/ {
        module = $0
        sub(/^-[ \t]*module[(][ \t]*/, "", module)
        sub(/[^a-zA-Z0-9_.].*$/, "", module)
        print
        next
    }
    /[ \t](get|put)[(][.][a-z]/ {
        key = $0
        sub(/^.*[ \t](get|put)[(][.]/, "", key)
        sub(/[^a-zA-Z0-9_.].*$/, "", key)
        tally[key]++
        match($0, /[(][.]/)
        print substr($0, 1, RSTART) module substr($0, RSTART+1)
        next
    }
    {
        print
    }
    END {
        for (key in tally) printf "%3d %s\n", tally[key], key >"/dev/stderr"
    }
    will turn

    -module(fee).
    -export([fie/0, foh/0, fum/0]).

    fie() ->
        put(.key, 1).

    foh() ->
	N = get(.key),
	put(.key, N+1).

    fum() ->
	get(.key).

    into

    -module(fee).
    -export([fie/0, foh/0, fum/0]).

    fie() ->
        put(fee.key, 1).

    foh() ->
        N = get(fee.key),
        put(fee.key, N+1).

    fum() ->
        get(fee.key).

    and write

      4 key

    to stderr, so I can see what keys are used this way.

    THIS IS JUST A TOY EXAMPLE.  I DO NOT USE IT IN PRACTICE.

(4) There is the memory leak problem.  If you put something in the process
    dictionary, it will stay there until it is explicitly removed.  There
    is no automatic garbage collection of keys from the process dictionary.
    While you have not presented a detailed specification of process scope
    variables, it looks like they would be even *worse* in this regard,
    because at least there is erase(dead_key).  What would you use for
    oddball variables?