Heap based mutable arrays?
Thomas Lindgren
thomasl@REDACTED
Wed May 2 14:43:23 CEST 2001
> Support for mutable arrays has almost got to the point of a FAQ.
>
> The current choices seem to be:
>
> tuples - slow when of any decent size due to the copying involved
> ets - quite a lot of fixed overhead (min 50uS on my machine).
> clever combinations of smaller tuples such as dynarray and friends.. not too
> bad - comparable to ets.
HIPE used a third solution for a long time. However, it requires support
from a generational garbage collector, since it relies on destructive
updates 'under the hood'.
The HIPE solution I wrote had these characteristics:
- declarative, so old versions of array still available if needed
- constant time access (read/write) to the most recent version
Here is the implementation. The idea was taken from Prolog work in the
80's (shown in manly ascii graphics):
1. There is no direct access to the array per se; all accesses go through
a _handle_ which is updated to reflect array changes
The original array is implemented as a tuple (array) with a handle:
<handle1> -> <array>
2. How to write element N with value V1 and old value V0:
<handle1>-> <N:V0> -> <array where N:=V1>
^
|
<newhandle>--+
Thus: (a) an exception element <N:V0> is allocated
(b) the array/tuple is updated with N := V1
(c) a new handle is allocated to point to the array
(d) the new handle is returned
This is a constant time operation, but of course not as cheap as a direct
update.
3. How to look up an element N in array (starting from a handle):
- if there is an exception chain, check if N occurs in the chain;
if so, return that value
- otherwise, return element N of the array
Accessing the most recent array is thus a constant time operation,
since there is no exception list to scan.
What were the drawbacks:
- Implemented in Erlang (except for the destructive update), so one
could subvert the implementation. By moving to C, handles etc can
be made invisible (just like they are for binaries).
- The exception chain is costly if an old array is used. This can be
solved by using another method for lookup: if an exception chain is
found, _copy_ the array, install the exception chain in the copy and
repoint the handle to the new array. (This can also be done at GC-time.)
The effect: constant time access to the old array (after the copy),
but sometimes requires more space. Also thrashing if a single lookup
is made in the old array.
(What to do requires more study, I suspect; the policy could be
handled by setting options at array creation time.)
- Possibilities to create circular references, and references from the
old generation to the new generation in a gen-gc system. There are
well-known solutions to this, since almost all gen-gcs have to handle
the problem. (Basically, keep a stack or list of pointers from old to
new generation; push a new element when such a pointer is created.)
I think some other copying functions in erts rely on non-circularity,
which also needs changing.
What were the advantages:
- Very useful, in short :-) For HIPE, graph algorithms became much
more pleasant. I also implemented hash tables on top of the arrays,
which rapidly spread through the compiler. Many common algorithms
are much easier to write when your array ADT has the right
complexity.
I know there has been discussions about implementing something like
this in OTP, but I don't know the precise status. Björn G. probably
knows more :-)
Thomas
--
Thomas Lindgren thomas+junk@REDACTED
Alteon WebSystems
More information about the erlang-questions
mailing list