Some new mnesia benchmarking results

Fri Nov 9 15:28:08 CET 2001

On Wed, 7 Nov 2001, Per Bergqvist wrote:

>Hi Sean,
>
>agree that mnesia is now getting ready for prime time as long
>as you have small records ;-). (Results are even more
>impressive on my little 1.7 GHz P4 Linux box )
>
>I have a nasty problem that I still haven't resolved. I you
>have more and larger records than you have available RAM the
>system will get on its knees.
>
>I have tried to use disk_only_copies but it after 1 hour it had
>only written ~1M records so I aborted it. This should be
>compared with the 15Krecords/sec i get with disk_copies.
>
>Does anyone have a solution on how to store 10Mx2K records with
>mnesia without buying 20GB RAM ?
>
>/Per

What are your access patterns like?

I have some ideas that I'm testing currently.

1. Dan and Håkan have written a module in mnesia called
   mnesia_frag.erl. It's wholly undocumented, except for the
   source, but it seems to work really well. The idea is that
   you can treat a number of regular mnesia tables as fragments
   of a larger table. Operations on the "base table" access
   the appropriate fragment based on a hashing function on
   the key. Fragments are distributed evenly across a pool
   of processors.

2. I have written (but not tested much yet) a modified
   mnesia_frag that also supports static distribution of
   fragments using a callback function to identify each
   fragment instead of using a hash.

3. One could imagine splitting a very large disc_only table
   into several smaller fragments, even on a single node
   system. This wouldn't be of much use if the fragments are
   replicated, and the application requires the tables to be
   fully synched before they can be used (I've found this to
   be a logical way to do things in real systems.) When synching
   replicas, mnesia will read all objects into memory and pass
   them to the other node in regular Erlang messages. This is
   done for one table at a time.

4. Onto my most recent experiment: I've modified mnesia to
   load a configurable number of tables in parallel. This
   gave particularly good results on many disc_only_copies
   (in one test, I reduced the startup time by half.)
   In combination with (3) above, this might be of some
   help to you. I also wouldn't mind assistance in verifying
   that my patch doesn't jeopardize stability.

As a side note, disc_only_copies shouldn't be read into memory,
but if they are replicated, they are. Mnesia will always copy the
entire table (the most current copy) to the other nodes, by
traversing the dets file and sending the objects to a receiver on
the other node (aggregated into 8K chunks). There is some
tremendous optimization potential here (perhaps the next
experiment, but it's a non-trivial problem to solve...)

/Uffe
(These are not official statements of the mnesia team. ;)
-- 
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson Telecom AB, ATM Multiservice Networks