Spatial Indexes in mnesia

Joachim Durchholz joachim.durchholz@REDACTED
Fri Oct 17 12:11:14 CEST 2003

You can fake spatial indexes via "bit interleaving". Assuming four-bit 
X, Y, and Z coordinates XXXX, YYYY, and ZZZZ, construct a bit vector 
XYZXYZXYZXYZ. When traversing the records in the order imposed by that 
index, you'll retrieve them in the order that you'd visit them if they 
were stored in an octree.
For example, if you have the interleaved coordinates 00101xxxxxxx (where 
the "x" indicate don't-care bits), this will give the XYZ coordinates 
(00xx,01xx,1xxx), which is exactly the area that a specific octree node 

To actually find those records within the space indicated by the query, 
the following algorithm is in order:
* The algorithm takes two parameters, one of them an interleaved 
coordinate with don't cares TV (the "test vector"), and a set of test 
vectors IN, containing ranges that are known to be entirely within the 
area described by the query.
* If the coordinate range indicated by TV contains no records, return IN.
* If the range indicated by TV is entirely within the space described by 
the query, return IN + {TV}.
* If the range indicated by TV is entirely outside the space described 
by the query, return IN.
* Otherwise, create two new test vectors from TV: one by replacing the 
first don't-care bit of TV with a 0, and one by replacing it with 1, 
giving the new test vectors TV0 and TV1. Return
   ALG (TV0, ALG (TV1, IN))
where ALG is this algorithm.

This will give a set of ranges that contain the records within the space 
indicated by the query, ready for retrieving them from mnesia.

This is a useful framework.
For one, set operations are easy to do on this representation. 
Intersection and union of range lists are nearly trivial, set difference 
requires a bit of thinking and care but is rather straightforward as well.
Another question might be how to set up a view. The algorithm doesn't 
return a Mnesia view, it returns lists of coordinate ranges. To convert 
them to a view without doing extensive changes to Mnesia, you'd have to 
construct a temporary table with the (primary keys of) the records that 
are selected by the spatial query. I'm not sure how attractive 
efficiency-wise that would be, but it might save your neck if you need 
to postprocess the result set with further queries.

A general note: the above algorithm leaves open how the intersection 
test between coordinate ranges and geometric figures can be done. In the 
most general case, the intersection-testing code would be passed in as a 
function or as a fun.
However, such testing can be tricky for complicated geometric queries. 
It might be better to write a test function for simple queries ("inside 
a rectangular range", "inside an ellipsoid", "to the right of a plane" 
etc.), run the algorithm for each simple query that's in the complex 
query, and then do set operations.
Note that for this to work, you need every simple query in two forms: 
one for the inside and one for the outside. It might save a few lines of 
code and a few hours of testing to make three-valued functions that 
return "inside", "outside", or "intersects", and adapt the base 
algorithm accordingly.

Hope this all is useful to somebody.


More information about the erlang-questions mailing list