mnesia power index

Ulf Wiger (AL/EAB) ulf.wiger@REDACTED
Thu Feb 17 19:06:22 CET 2005


I thought I'd share a small hack I made to mnesia-4.2.

The purpose of the hack was to make room for more 
flexible index functionality.

You can now create an index in the following manner:

mnesia:add_table_index(Tab, Pos :: integer() | atom());
mnesia:add_table_index(Tab, {{Pos,Tag}, M, F, Arg, IsOrdered}).

   Pos : integer() | atom()   The attribute position or name,
                              as with old-style indexes
   Tag : atom()               A user-friendly tag; {Pos,Tag}
                              uniquely identifies the index
   M   : atom()               Module name
   F   : atom()               Function name
   Arg : term()               Extra argument
   IsOrdered : bool()         true means an ordered index

With the new syntax, you can have several indexes on a given 
attribute. A special case is if Pos == 1. Then the index works
on the whole object.

The index callback function is called like this:

M:F(Data, Arg) -> [IndexValue].

Data is either the value of the given attribute, or the whole
object, if Pos == 1. Multiple index values can be created for
each object -- e.g. when breaking up a string into whole words.

The old way works as before. You can also create both old and 
new indexes via the create_table/2 function.

I've attached some modified mnesia files. In order to compile
them, you need to make sure the compiler can locate mnesia.hrl
(It's in mnesia-4.2/src/). It's 43K -- I hope the list server
doesn't think that's too much.
 
Functions like index_match_object() work only on old-style indexes,
and you can not have an ordered index on a disc_only table, 
as dets doesn't support the ordered_set type.

I've added the following functions. In the following, 
Index :: Pos | {Pos, Tag}; Pos :: atom() | integer():

 - dirty_index_foldl(Fun,Acc,Tab,Index)
 - dirty_index_foldr(Fun,Acc,Tab,Index)
 - dirty_index_first(Tab, Index)
 - dirty_index_next(Tab, Index, Key)
 - dirty_index_prev(Tab, Index, Key)
 - dirty_index_last(Tab, Index)

The first/next/last functions return {IndexValue, Objects}
The fold[lr] functions call Fun({IndexValue, ObjKey, [Object]}, Acc)

=:=:=:=:=:=:=:=:=:=:=

So, what can you do with this?

Well, lots. A few obvious uses are:
- index on whole words
- index on word stems (we'll try to demo this soon)
- convert strings to lower case
- index on attribute combinations (compound indexes)
- perhaps even redo the snmp hook so that it 
  doesn't have to be a special hack, requiring 
  the primary key to be structured in a special way.

Below are some examples.

Please take it for a spin, and let me know what you think.
Please note that I have no authority to put anything into
mnesia, so if you like this stuff, you can help lobby for it.

Regards,
Uffe


%%%%% First, a callback module with indexing functions:

-module(test).

-export([words/2, name/2]).

-import(httpd_util, [to_lower/1]).

words(Str, []) ->
    string:tokens(Str, " \t\n");
words(Str, locase) ->
    [to_lower(W) || W <- string:tokens(Str, " \t\n")].


name(Obj, [locase,FN,LN]) ->
    FirstName = element(FN, Obj),
    LastName  = element(LN, Obj),
    [{to_lower(LastName), to_lower(FirstName)}].


%%%% Then, some shell interaction:

=PROGRESS REPORT==== 17-Feb-2005::18:49:38 ===
         application: mnesia
          started_at: nonode@REDACTED
ok

   ** First, a simple index that splits a string into words:

3> mnesia:create_table(test,[{attributes,[key,ref,data]}]).
{atomic,ok}
4> mnesia:add_table_index(test,{{data,words},test,words,[],true}).
{atomic,ok}
5> mnesia:dirty_write({test,"uffe",ref1,"uffes words"}).
ok
6> mnesia:dirty_write({test,"hans",ref1,"hanses words"}).
ok
7> mnesia:dirty_write({test,"per",ref2,"pers word"}).
ok
8> mnesia:dirty_index_read(test,"words",{data,words}).
[{test,"hans",ref1,"hanses words"},{test,"uffe",ref1,"uffes words"}]
9> mnesia:dirty_index_read(test,"word",{data,words}).
[{test,"per",ref2,"pers word"}]

  ** Just making sure that old indexes still work:

10> mnesia:add_table_index(test,ref).
{atomic,ok}
11> mnesia:dirty_index_read(test,ref1,ref).
[{test,"hans",ref1,"hanses words"},{test,"uffe",ref1,"uffes words"}]
12> mnesia:dirty_index_read(test,ref2,ref).
[{test,"per",ref2,"pers word"}]

  ** Verifying that you can also delete indexes:

13> mnesia:del_table_index(test, {data,words}).
{atomic,ok}
14> mnesia:del_table_index(test, ref).
{atomic,ok}
15> 
15> 

  ** A bag table. These are tricky because you must filter
  ** out objects with the same key, but where the index fun
  ** doesn't produce a matching index value:

15> mnesia:create_table(testbag,[{type,bag},{attributes,[key,ref,data]}]).
{atomic,ok}
16> mnesia:add_table_index(testbag,{{data,words},test,words,[],true}).
{atomic,ok}
17> mnesia:dirty_write({testbag,uffe,ref1,"uffes words"}).
ok
18> mnesia:dirty_write({testbag,hans,ref1,"hanses words"}).
ok
19> mnesia:dirty_write({testbag,uffe,ref2,"pers word"}).
ok
20> mnesia:dirty_index_read(testbag,"words",{data,words}).
[{testbag,hans,ref1,"hanses words"},{testbag,uffe,ref1,"uffes words"}]
21> mnesia:dirty_index_read(testbag,"word",{data,words}).
[{testbag,uffe,ref2,"pers word"}]
22> 
22> 

  ** Another small example, showing how to do case-insensitive 
  ** index lookups, unordered index:

22> mnesia:create_table(test3,[{attributes,[key,data]}]).
{atomic,ok}
23> mnesia:add_table_index(test3,{{data,words},test,words,locase,false}).
{atomic,ok}
24> mnesia:dirty_write({test3,1,"The Quick Brown Fox"}).
ok
25> mnesia:dirty_write({test3,2,"the quick brown fox"}).
ok
26> mnesia:dirty_write({test3,3,"JUMPS OVER THE LAZY DOG"}).
ok
27> mnesia:dirty_write({test3,4,"jumps over the lazy dog"}).
ok
28> mnesia:dirty_index_read(test3,"fox",{data,words}).
[{test3,2,"the quick brown fox"},{test3,1,"The Quick Brown Fox"}]
29> mnesia:dirty_index_read(test3,"the",{data,words}).
[{test3,4,"jumps over the lazy dog"},
 {test3,3,"JUMPS OVER THE LAZY DOG"},
 {test3,2,"the quick brown fox"},
 {test3,1,"The Quick Brown Fox"}]
30> mnesia:dirty_index_read(test3,"dog",{data,words}).
[{test3,4,"jumps over the lazy dog"},{test3,3,"JUMPS OVER THE LAZY DOG"}]
31> 
31> 

  ** An example showing a compound case-insensitive, ordered index:

31> mnesia:create_table(test4,[{attributes,[key,firstname,lastname,data]}]).
{atomic,ok}
32> mnesia:add_table_index(test4,{{1,name},test,name,[locase,3,4],true}).
{atomic,ok}
33> mnesia:dirty_write({test4,1,"Ulf","Wiger","The Quick Brown Fox"}).
ok
34> mnesia:dirty_write({test4,2,"Joe", "Armstrong","the quick brown fox"}).
ok
35> mnesia:dirty_write({test4,3,"ulf", "wiger", "JUMPS OVER THE LAZY DOG"}).
ok
36> mnesia:dirty_write({test4,4,"joe", "armstrong", "jumps over the lazy dog"}). 
ok
37> mnesia:dirty_index_read(test4,{"wiger","ulf"},{1,name}).
[{test4,1,"Ulf","Wiger","The Quick Brown Fox"},
 {test4,3,"ulf","wiger","JUMPS OVER THE LAZY DOG"}]
38> mnesia:dirty_index_read(test4,{"armstrong","joe"},{1,name}).
[{test4,2,"Joe","Armstrong","the quick brown fox"},
 {test4,4,"joe","armstrong","jumps over the lazy dog"}]

  ** Let's try the fold and iterator functions:

39> mnesia:dirty_index_foldr(fun({IdxKey,ObjKey,Objs} =X, Acc) -> [X|Acc] end, [], test4, {1,name}).
[{{"armstrong","joe"},2,[{test4,2,"Joe","Armstrong","the quick brown fox"}]},
 {{"armstrong","joe"},
  4,
  [{test4,4,"joe","armstrong","jumps over the lazy dog"}]},
 {{"wiger","ulf"},1,[{test4,1,"Ulf","Wiger","The Quick Brown Fox"}]},
 {{"wiger","ulf"},3,[{test4,3,"ulf","wiger","JUMPS OVER THE LAZY DOG"}]}]

  ** Oops! The following functions don't work right!

40> mnesia:dirty_index_first(test4,{1,name}).
{{{"armstrong","joe"},2},[]}
41> mnesia:dirty_index_next(test4,{1,name},{{"armstrong","joe"},2}).
{{{"armstrong","joe"},4},[]}

  ** They do work with old-style indexes, and should work with 
  ** unordered indexes. I will fix this.

42> mnesia:add_table_index(test,ref).                                           {atomic,ok}         
43> mnesia:dirty_index_first(test,ref).                             
{ref1,[{test,"hans",ref1,"hanses words"},{test,"uffe",ref1,"uffes words"}]}
44> mnesia:dirty_index_next(test,ref,ref1).
{ref2,[{test,"per",ref2,"pers word"}]}
45> 
 <<mnesia_power_index.tgz>> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mnesia_power_index.tgz
Type: application/x-compressed
Size: 43811 bytes
Desc: mnesia_power_index.tgz
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20050217/02ff0054/attachment.bin>


More information about the erlang-questions mailing list