[erlang-questions] 2D Map Arrays

Mon Sep 11 10:51:43 CEST 2006

Heinrich Venter wrote:
 > If you need to do writes too things change drastically.

Of course - and the whole point of the array.erl implementation
is to support writes (and have a good balance between writes and
lookups). For lookup only, you probably can't beat a big, flat
tuple, but that does not scale when it comes to writing. We want
to have reasonably efficient arrays of many millons of elements.

> However when writes were
> done, the ets solution outperformed the binary one by a factor of about
> 10 and also used less memory.

Well, binaries are not really a candidate for random-access writing,
(even when you have fixed, known sizes for the elements), since it
involves an awful amount of copying.

> Consider this:  If you use a tuple based solution you have some options
> on how you partition it.
> Say you have a 10x10 array.  You could store it as a single tuple with
> 100 elements and use I=(Row*10+Col) to get the correct index for
> element(I,Map).
> You could also make it a tuple of 10 row tuples so that the access
> becomes element(Col,element(Row,Map))
> But why stop there, why not break it up further into regions for a
> quad-tree type structure?
> If you break it up fine enough you will end up with something that looks
> almost like an ets table where each element has the form {{X,Y},Data}.
> So in the end ets will probably give you the most bang for your buck.

First, we do not try to optimize for extremely sparse arrays (in which 
case quad trees or plain binary trees would perform well - and we 
already have decent binary trees), but for fairly densely populated 
ones. Second, in order to get fast operations, you have to have as few 
and as cheap tests as possible, and a regular layout like the one we use 
(nested N-tuples, where N=10 has been measured to be a good choice) 
simply gives us the fastest code - about 2-3 times faster than gb_trees.
(On the other hand, you can only use nonnegative integers as keys!)

Ets tables would not be faster if they were implemented in Erlang, 
instead of in C, and functional arrays have a couple of important 
properties that ets tables do not have:

  - No copying when reading/writing data. Ets tables always copy
    the data in and out of the table (except for large binaries),
    and is thus not suitable for storing things like large lists
    or tree structures.

  - They are _functional_, not destructive, so they can be used in
    places where you want to keep older versions of the array around
    (e.g., in an "undo"-list). With ets tables, you overwrite the old
    version of the data. Also, since the arrays are tree-based, keeping
    several "copies" around does not cost a lot space-wise.

	/Richard