Heap based mutable arrays?
Wed May 2 14:43:23 CEST 2001
> Support for mutable arrays has almost got to the point of a FAQ.
> The current choices seem to be:
> tuples - slow when of any decent size due to the copying involved
> ets - quite a lot of fixed overhead (min 50uS on my machine).
> clever combinations of smaller tuples such as dynarray and friends.. not too
> bad - comparable to ets.
HIPE used a third solution for a long time. However, it requires support
from a generational garbage collector, since it relies on destructive
updates 'under the hood'.
The HIPE solution I wrote had these characteristics:
- declarative, so old versions of array still available if needed
- constant time access (read/write) to the most recent version
Here is the implementation. The idea was taken from Prolog work in the
80's (shown in manly ascii graphics):
1. There is no direct access to the array per se; all accesses go through
a _handle_ which is updated to reflect array changes
The original array is implemented as a tuple (array) with a handle:
<handle1> -> <array>
2. How to write element N with value V1 and old value V0:
<handle1>-> <N:V0> -> <array where N:=V1>
Thus: (a) an exception element <N:V0> is allocated
(b) the array/tuple is updated with N := V1
(c) a new handle is allocated to point to the array
(d) the new handle is returned
This is a constant time operation, but of course not as cheap as a direct
3. How to look up an element N in array (starting from a handle):
- if there is an exception chain, check if N occurs in the chain;
if so, return that value
- otherwise, return element N of the array
Accessing the most recent array is thus a constant time operation,
since there is no exception list to scan.
What were the drawbacks:
- Implemented in Erlang (except for the destructive update), so one
could subvert the implementation. By moving to C, handles etc can
be made invisible (just like they are for binaries).
- The exception chain is costly if an old array is used. This can be
solved by using another method for lookup: if an exception chain is
found, _copy_ the array, install the exception chain in the copy and
repoint the handle to the new array. (This can also be done at GC-time.)
The effect: constant time access to the old array (after the copy),
but sometimes requires more space. Also thrashing if a single lookup
is made in the old array.
(What to do requires more study, I suspect; the policy could be
handled by setting options at array creation time.)
- Possibilities to create circular references, and references from the
old generation to the new generation in a gen-gc system. There are
well-known solutions to this, since almost all gen-gcs have to handle
the problem. (Basically, keep a stack or list of pointers from old to
new generation; push a new element when such a pointer is created.)
I think some other copying functions in erts rely on non-circularity,
which also needs changing.
What were the advantages:
- Very useful, in short :-) For HIPE, graph algorithms became much
more pleasant. I also implemented hash tables on top of the arrays,
which rapidly spread through the compiler. Many common algorithms
are much easier to write when your array ADT has the right
I know there has been discussions about implementing something like
this in OTP, but I don't know the precise status. Björn G. probably
knows more :-)
More information about the erlang-questions