[erlang-questions] MySQL cluster

Scott Lystig Fritchie fritchie@REDACTED
Sat Oct 21 04:31:48 CEST 2006


>> On 10/20/06, Dmitrii Dimandt <dmitriid@REDACTED> wrote: > What's
>> the optimum maximum size for Mnesia then? A bout a couple of >
>> gigabytes? Or more?

>>>>> "rs" == Roberto Saccon <rsaccon@REDACTED> writes:

rs> I had the same question and I am currently investigating in using
rs> mnesia just as cache for a clustered web application running on
rs> amazon EC2 and the real storage is on amazon S3 [...]

These sizing questions are popping up on a regular basis now, getting
to be FAQ frequency.  Speaking of which, the Erlang FAQ at
http://www.erlang.org/faq/x1409.html is quite useful.  Especially if
you keep in mind how Mneisa stores ram_copies, disc_copies, and
disc_only_copies tables.

The first two use RAM data structures.  They're limited by virtual RAM
availability, though (for practical performance reasons) they're
really limited by physical RAM availability.  For the last few months
I've been using Mnesia + disc_copies for with 8-12GB total table sizes
on a 16GB RAM Linux box.  The OS has an annoying tendency to start
paging out some of pages waaaaay too early (there's 4-8 GB RAM left,
why start reclaiming my pages, grrrr).  IIRC, Ulf Wiger has reported
putting about 15GB of data into Mnesia disc_copies tables.

The last method, disc_only_copies, uses a disk-based structure.  Its
limitations and work-arounds are:

    limitation                         workaround
    ----------                         ----------
    2GB file size per table            Use Mnesia fragmentation to
                                       split large tables into
                                       < 2GB fragments

    Long startup time when tables      None, though "long" is a relative
    are not closed "nicely".           word, so the startup cost to re-scan
                                       the hash table structures on disk 
                                       may not be too long.  Terabytes of
                                       disc_only_copies data is probably
                                       not practical, but I've never
                                       measured it, so I don't know.  :-)

Finding 16GB RAM machines isn't nearly so difficult these days.  If
you need fast access to data, Mnesia is quite fast.  For a 7GB Mnesia
data set, I have a complex query that's 9x faster than the (properly
indexed and tuned) SQL equivalent for MySQL 5.iforget (InnoDB tables)
and 20x faster than PostgreSQL 7.something?.(*)

For high availability purposes, naturally, you need 2x machines (or
more) with "enough" RAM.

The original poster (?) was asking about comparison with MySQL
Cluster?  If so, IIRC that product is also a main memory database, so
you'll have the similar "You want me to buy WHAT?" problems with your
boss.  :-)  I don't have any measurements on the relative memory
efficiency/compactness of the same data in Mnesia vs. MySQL cluster.

-Scott

(*) No I haven't tried MySQL Cluster or TimesTen, but I'm hoping for
some free time to try it out.



More information about the erlang-questions mailing list