Heap based mutable arrays?

Wed May 2 14:43:23 CEST 2001

> Support for mutable arrays has almost got to the point of a FAQ.
> 
> The current choices seem to be:
>
> tuples - slow when of any decent size due to the copying involved
> ets - quite a lot of fixed overhead (min 50uS on my machine).
> clever combinations of smaller tuples such as dynarray and friends.. not too
> bad - comparable to ets.

HIPE used a third solution for a long time. However, it requires support
from a generational garbage collector, since it relies on destructive
updates 'under the hood'.

The HIPE solution I wrote had these characteristics:

- declarative, so old versions of array still available if needed
- constant time access (read/write) to the most recent version

Here is the implementation. The idea was taken from Prolog work in the
80's (shown in manly ascii graphics):

1. There is no direct access to the array per se; all accesses go through
   a _handle_ which is updated to reflect array changes

The original array is implemented as a tuple (array) with a handle:
<handle1> -> <array>

2. How to write element N with value V1 and old value V0:

<handle1>-> <N:V0> -> <array where N:=V1>
                        ^
                        |
           <newhandle>--+

Thus: (a) an exception element <N:V0> is allocated
      (b) the array/tuple is updated with N := V1
      (c) a new handle is allocated to point to the array
      (d) the new handle is returned

This is a constant time operation, but of course not as cheap as a direct
update.

3. How to look up an element N in array (starting from a handle):
   - if there is an exception chain, check if N occurs in the chain;
     if so, return that value
   - otherwise, return element N of the array

Accessing the most recent array is thus a constant time operation,
since there is no exception list to scan.

What were the drawbacks:

- Implemented in Erlang (except for the destructive update), so one
  could subvert the implementation. By moving to C, handles etc can
  be made invisible (just like they are for binaries).

- The exception chain is costly if an old array is used. This can be
  solved by using another method for lookup: if an exception chain is
  found, _copy_ the array, install the exception chain in the copy and
  repoint the handle to the new array. (This can also be done at GC-time.)

  The effect: constant time access to the old array (after the copy),
  but sometimes requires more space. Also thrashing if a single lookup
  is made in the old array.

  (What to do requires more study, I suspect; the policy could be
  handled by setting options at array creation time.)

- Possibilities to create circular references, and references from the
  old generation to the new generation in a gen-gc system. There are
  well-known solutions to this, since almost all gen-gcs have to handle
  the problem. (Basically, keep a stack or list of pointers from old to
  new generation; push a new element when such a pointer is created.)

  I think some other copying functions in erts rely on non-circularity,
  which also needs changing.

What were the advantages:

- Very useful, in short :-) For HIPE, graph algorithms became much
  more pleasant. I also implemented hash tables on top of the arrays,
  which rapidly spread through the compiler. Many common algorithms
  are much easier to write when your array ADT has the right
  complexity.

I know there has been discussions about implementing something like
this in OTP, but I don't know the precise status. Björn G. probably
knows more :-)

			Thomas
-- 
Thomas Lindgren					thomas+junk@REDACTED
Alteon WebSystems