Spatial Indexes in mnesia
Joachim Durchholz
joachim.durchholz@REDACTED
Fri Oct 17 12:11:14 CEST 2003
You can fake spatial indexes via "bit interleaving". Assuming four-bit
X, Y, and Z coordinates XXXX, YYYY, and ZZZZ, construct a bit vector
XYZXYZXYZXYZ. When traversing the records in the order imposed by that
index, you'll retrieve them in the order that you'd visit them if they
were stored in an octree.
For example, if you have the interleaved coordinates 00101xxxxxxx (where
the "x" indicate don't-care bits), this will give the XYZ coordinates
(00xx,01xx,1xxx), which is exactly the area that a specific octree node
covers.
To actually find those records within the space indicated by the query,
the following algorithm is in order:
* The algorithm takes two parameters, one of them an interleaved
coordinate with don't cares TV (the "test vector"), and a set of test
vectors IN, containing ranges that are known to be entirely within the
area described by the query.
* If the coordinate range indicated by TV contains no records, return IN.
* If the range indicated by TV is entirely within the space described by
the query, return IN + {TV}.
* If the range indicated by TV is entirely outside the space described
by the query, return IN.
* Otherwise, create two new test vectors from TV: one by replacing the
first don't-care bit of TV with a 0, and one by replacing it with 1,
giving the new test vectors TV0 and TV1. Return
ALG (TV0, ALG (TV1, IN))
where ALG is this algorithm.
This will give a set of ranges that contain the records within the space
indicated by the query, ready for retrieving them from mnesia.
This is a useful framework.
For one, set operations are easy to do on this representation.
Intersection and union of range lists are nearly trivial, set difference
requires a bit of thinking and care but is rather straightforward as well.
Another question might be how to set up a view. The algorithm doesn't
return a Mnesia view, it returns lists of coordinate ranges. To convert
them to a view without doing extensive changes to Mnesia, you'd have to
construct a temporary table with the (primary keys of) the records that
are selected by the spatial query. I'm not sure how attractive
efficiency-wise that would be, but it might save your neck if you need
to postprocess the result set with further queries.
A general note: the above algorithm leaves open how the intersection
test between coordinate ranges and geometric figures can be done. In the
most general case, the intersection-testing code would be passed in as a
function or as a fun.
However, such testing can be tricky for complicated geometric queries.
It might be better to write a test function for simple queries ("inside
a rectangular range", "inside an ellipsoid", "to the right of a plane"
etc.), run the algorithm for each simple query that's in the complex
query, and then do set operations.
Note that for this to work, you need every simple query in two forms:
one for the inside and one for the outside. It might save a few lines of
code and a few hours of testing to make three-valued functions that
return "inside", "outside", or "intersects", and adapt the base
algorithm accordingly.
Hope this all is useful to somebody.
Regards,
Jo
More information about the erlang-questions
mailing list