multi-attribute mnesia indexes?

Tue Jan 2 13:58:09 CET 2001

On Fri, 29 Dec 2000, Shawn Pearce wrote:

>We're working on an application that probably should be using
>Oracle. However, the dataset is small enough that we should be able
>to use mnesia (100,000 rows in a table).  What we have run into is
>that we want to have 16 or so processes scanning the mnesia table,
>while another two are performing write transactions against it.

One thing about mnesia is that it's not really prepared for
applications that write constantly to disk-based tables.

>First problem is that Mnesia is reporting its overloaded.  The exact
>console message is:
>
>	=ERROR REPORT==== 28-Dec-2000::23:55:46 ===
>	Mnesia('spearce@REDACTED'): ** ERROR ** Mnesia is overloaded: {dump_log, time_threshold}
>
>I dug in the archives and added these to my command line:
>
>        -mnesia dump_log_load_regulation false \
>        -mnesia dump_log_write_threshold 100000 \
>
>This cut back on the number of Mnesia error reports to one every few
>minutes, but they are still occuring.
>
>What the appliation is doing is, two generator processes are writing
>records into two mnesia tables, some 100,000 records at once.  Both
>processes are running in a tight loop, kind of like what you see
>below:

You seem to understand the nature of mnesia's transaction logging, so
I won't go into that, but...

Depending on the speed of your file system, you might run into a
situation where mnesia simply can't keep up with your writer process.
One solution, which would have to be implemented in mnesia, is to make
updates completely synchronous. We at AXD 301 have asked for this, as
we would like the processes in our system to pay the cost up-front for
updating disk tables. Currently, you can easily write a program that
gives mnesia serious headaches, due to the asynchronous nature of
mnesia:transaction/1.

>Rough calculation shows that mnesia is only doing 43 of these
>transactions per second with the system load such that it is.
>
>Now to add to the confusion, 16 other processes are running
>dirty_match_object operations against the tables at the same time the
>two generators are writing to them.  One of the 16 processes reads only
>one column in an index, so we use dirty_index_read.  The other 15 are
>busy with calls (many calls) to dirty_match_object.  The pattern used
>is the wild pattern for the table (9 attributes), with 5 of the
>attributes filled in with a value.  The other 4 were left alone.  (To
>be wild cards.)  None of these was the primary key (first attribute).
>
>Erlang uses 99% of the CPU to run this job.  Right now, its up at 70 MB
>of RAM, as the tables are all disk_copies tables (so they are cached
>in RAM).  Would switchig to disk_only tables help performance, getting
>rid of the cruft from RAM faster?  My machine has 256 MB of RAM free,
>so swapping is not occuring at the OS level.

If you're calling dirty_match_object/2 with a wildcard pattern on the
primary key, and a table with 100000 objects, the function will to a
full scan every time. This goes for disc_only tables as well, but the
match will be much slower. You will see different characteristics, as
a dets-based match will use many more reductions, and the process will
yield more often.

>So.....
>
>1) What can I do differently to prevent mnesia from whining about
>its log files?

I don't know. One drastic measure might be to wrap the call to
mnesia:transaction/1 thus:

transaction(Fun) ->
   case mnesia:transaction(Fun) of
      {atomic, Result} ->
         mnesia:dump_log(),
         {atomic, Result};
      Other ->
         Other
   end.

(I've never tried it myself.)

>2) Is there anything I can do to increase the performance of my
>match operation?  Would switching to mnemosyne help in this
>sitution?  Does mnesia support multi-attribute indexes which would
>speed up the performance of the match_object operation?

Mnemosyne should be able to make clever use of indexing, but I don't
know if it would improve your application's performance.

The rdbms-1.4 user contrib supports compound attributes, and also has
a select() function (no relation to SQL SELECT). However, the select()
function doesn't take into account that parts of a compound attribute
may be indexed, and rdbms also doesn't allow indexing of compound
attributes (I grew weary trying to figure out how to do this
elegantly.)

Personally, I'd like to have a way to built derived index tables in
mnesia (or rdbms), where one could specify a fun(Object) ->
IndexValue to be called by mnesia). So many ideas -- so little time...

/Uffe
-- 
Ulf Wiger                                    tfn: +46  8 719 81 95
Senior System Architect                      mob: +46 70 519 81 95
Strategic Product & System Management    ATM Multiservice Networks
Data Backbone & Optical Services Division      Ericsson Telecom AB