Heap based mutable arrays?

Thomas Lindgren thomasl@REDACTED
Wed May 2 14:43:23 CEST 2001

> Support for mutable arrays has almost got to the point of a FAQ.
> The current choices seem to be:
> tuples - slow when of any decent size due to the copying involved
> ets - quite a lot of fixed overhead (min 50uS on my machine).
> clever combinations of smaller tuples such as dynarray and friends.. not too
> bad - comparable to ets.

HIPE used a third solution for a long time. However, it requires support
from a generational garbage collector, since it relies on destructive
updates 'under the hood'.

The HIPE solution I wrote had these characteristics:

- declarative, so old versions of array still available if needed
- constant time access (read/write) to the most recent version

Here is the implementation. The idea was taken from Prolog work in the
80's (shown in manly ascii graphics):

1. There is no direct access to the array per se; all accesses go through
   a _handle_ which is updated to reflect array changes

The original array is implemented as a tuple (array) with a handle:
<handle1> -> <array>

2. How to write element N with value V1 and old value V0:

<handle1>-> <N:V0> -> <array where N:=V1>

Thus: (a) an exception element <N:V0> is allocated
      (b) the array/tuple is updated with N := V1
      (c) a new handle is allocated to point to the array
      (d) the new handle is returned

This is a constant time operation, but of course not as cheap as a direct

3. How to look up an element N in array (starting from a handle):
   - if there is an exception chain, check if N occurs in the chain;
     if so, return that value
   - otherwise, return element N of the array

Accessing the most recent array is thus a constant time operation,
since there is no exception list to scan.

What were the drawbacks:

- Implemented in Erlang (except for the destructive update), so one
  could subvert the implementation. By moving to C, handles etc can
  be made invisible (just like they are for binaries).

- The exception chain is costly if an old array is used. This can be
  solved by using another method for lookup: if an exception chain is
  found, _copy_ the array, install the exception chain in the copy and
  repoint the handle to the new array. (This can also be done at GC-time.)

  The effect: constant time access to the old array (after the copy),
  but sometimes requires more space. Also thrashing if a single lookup
  is made in the old array.

  (What to do requires more study, I suspect; the policy could be
  handled by setting options at array creation time.)

- Possibilities to create circular references, and references from the
  old generation to the new generation in a gen-gc system. There are
  well-known solutions to this, since almost all gen-gcs have to handle
  the problem. (Basically, keep a stack or list of pointers from old to
  new generation; push a new element when such a pointer is created.)

  I think some other copying functions in erts rely on non-circularity,
  which also needs changing.

What were the advantages:

- Very useful, in short :-) For HIPE, graph algorithms became much
  more pleasant. I also implemented hash tables on top of the arrays,
  which rapidly spread through the compiler. Many common algorithms
  are much easier to write when your array ADT has the right

I know there has been discussions about implementing something like
this in OTP, but I don't know the precise status. Björn G. probably
knows more :-)

Thomas Lindgren					thomas+junk@REDACTED
Alteon WebSystems

More information about the erlang-questions mailing list