multi-attribute mnesia indexes?

Tue Jan 2 13:09:01 CET 2001

Shawn,

See comments below:

> We're working on an application that probably should be using Oracle.
> However, the dataset is small enough that we should be able to use
> mnesia (100,000 rows in a table).  What we have run into is that we
> want to have 16 or so processes scanning the mnesia table, 
> while another
> two are performing write transactions against it.
> 
> First problem is that Mnesia is reporting its overloaded.  The exact
> console message is:
> 
> 	=ERROR REPORT==== 28-Dec-2000::23:55:46 ===
> 	Mnesia('spearce@REDACTED'): ** ERROR 
> ** Mnesia is overloaded: {dump_log, time_threshold}

Updates to disc copies tables append to a logfile which is periodically
scanned and propogated into the main disc based database tables. This error
message can occur when the log has not finished dumping by the time the next
periodic or threshold based dump needs to start.

This is not too serious if it is a temporary overload situation as mnesia
will catch up (assuming you haven't filled the disc!)

In your case it sounds like you are pushing in more than mnesia can handle
on a continuous basis?

> I dug in the archives and added these to my command line:
> 
>         -mnesia dump_log_load_regulation false \
>         -mnesia dump_log_write_threshold 100000 \
>
> This cut back on the number of Mnesia error reports to one every few
> minutes, but they are still occuring.

dump_log_load_regulation is set to false by default so this isn't changing
anything here (false is the correct setting for max speed). All you are
doing is letting the log file get bigger before dumping - no particular
advantage

> What the appliation is doing is, two generator processes are writing
> records into two mnesia tables, some 100,000 records at once.  Both
> processes are running in a tight loop, kind of like what you 
> see below:
> 
> mk(0) -> done;
> mk(X) ->
> 	A = #foo{...},
> 	B = #bar{...},
> 	mnesia:transaction(fun() ->
> 		mnesia:write(A),
> 		mnesia:write(B)
> 	end),
> 	mk(X - 1).
> 
> I started them by hand from the shell with:
> 
> 	spawn(mymod, mk, [50000]).
> 	spawn(mymod, mk, [50000]).
> 
> Rough calculation shows that mnesia is only doing 43 of these
> transactions per second with the system load such that it is.
> 
> Now to add to the confusion, 16 other processes are running
> dirty_match_object operations against the tables at the same time the
> two generators are writing to them.  One of the 16 processes 
> reads only
> one column in an index, so we use dirty_index_read.  The other 15 are
> busy with calls (many calls) to dirty_match_object.  The pattern used
> is the wild pattern for the table (9 attributes), with 5 of the
> attributes filled in with a value.  The other 4 were left alone.  (To
> be wild cards.)  None of these was the primary key (first attribute).
> 
> Erlang uses 99% of the CPU to run this job.  Right now, its 
> up at 70 MB
> of RAM, as the tables are all disk_copies tables (so they are cached
> in RAM).  Would switchig to disk_only tables help performance, getting
> rid of the cruft from RAM faster?  My machine has 256 MB of RAM free,
> so swapping is not occuring at the OS level.
> 
> So.....
> 
> 1) What can I do differently to prevent mnesia from whining about its
> log files?

If you are using standard Unix File System changing to a more modern one
would help significantly.

You could subscribe to mnesia overload messages and slow down your writing
process if you recieve any.

If your write is a one off you could just forget about the overload messages
and ensure you have enough disc space to hold the largest log file..

> 2) Is there anything I can do to increase the performance of my match
> operation?  Would switching to mnemosyne help in this sitution?  Does
> mnesia support multi-attribute indexes which would speed up the
> performance of the match_object operation?

mnesia supports multi column indexes. Have a look at the docs for
mnesia:add_table_index/2. The function mnesia:index_match_obect will make
use of one explicitly named index where this is a bound variable in the
match tuple

> 
> At present, my only other option is to switch to a real SQL database,
> as I can get true multi-column indexes there.

I'd stick with the real mnesia database for a bit longer!

> --
> Shawn.

- Sean

NOTICE AND DISCLAIMER:
This email (including attachments) is confidential.  If you have received
this email in error please notify the sender immediately and delete this
email from your system without copying or disseminating it or placing any
reliance upon its contents.  We cannot accept liability for any breaches of
confidence arising through use of email.  Any opinions expressed in this
email (including attachments) are those of the author and do not necessarily
reflect our opinions.  We will not accept responsibility for any commitments
made by our employees outside the scope of our business.  We do not warrant
the accuracy or completeness of such information.