[erlang-questions] understanding the scaleability limits of erlang and mnesia

Tue Jan 26 21:19:10 CET 2010

I have a mix of ram, disc, and disc_only table configurations.

My hardware config is mixed. One box with single drive (to be end of lifed)
and 3 boxes with raid 5 to help with overall throughput of disk i/o.

My disc_only tables:

do1 : with 777805   records occupying 111980141 bytes on disc
do2 : with 594324   records occupying 639543647 bytes on disc
do3 : with 1837761  records occupying 512674458 bytes on disc

My disc tables:
d1 : with 1112655  records occupying 73119677 words of mem
d2 : with 1117441  records occupying 143819464 words of mem

My ram tables:
r1 : with 3493     records occupying 791941   words of mem
r2 : with 10482    records occupying 1976194  words of mem
r3 : with 14160    records occupying 520918   words of mem
r4 : with 3759     records occupying 79983    words of mem

Overall, it looks like about 1-2GB of data that would need to be replicated
/ xferred during startup. Is that correct?

Can you explain more about how index uniqueness affects recover / startup
times ?

Thx,
--b

On Tue, Jan 26, 2010 at 11:46 AM, Paul Fisher <pfisher@REDACTED>wrote:

> First of all, you need to specify the type of mnesia tables you are
> using.  I am going to assume you are using disk_copies, since disk_only
> and ram_only should not act the way you describe.
>
> Second, with disk_copies the tables should recover at the speed the file
> can be read from disk.  Typically this is 80M/s+ for even a single SATA
> drive, so even large tables should be fast.  While the size of the
> dataset does affect the table start time, it is more likely that you are
> seeing a problem with the uniqueness of either the index of the table,
> or of a secondary index.  If you have a good unique index for the table,
> but also have a secondary index with a limited number of values, the
> table will recover as you describe.
>
> On Tue, 2010-01-26 at 13:04 -0600, Brian Acton wrote:
> > Hi Guys,
> >
> > I'm fairly new to erlang and I'm trying to understand better how erlang
> and
> > mnesia deal with large scale. I'm wondering if anyone could provide some
> > examples where they have been using erlang in a very large configuration
> > (i.e. more than 10 machines  / more than 100 machines). I specifically am
> > interested where people are running in a clustered configuration with an
> > mnesia backing store to their application.
> >
> > It's been my experience that as much as a technology claims to be
> scalable,
> > operability issues usually surface that make it bad in practice to simply
> > just add more machines to the cluster. As an example, in my current
> > configuration, I am experiencing a 10 minute mnesia recovery /
> verification
> > time during node startup. If I try to bring up two nodes at the same
> time, I
> > see even longer times and sometimes even failure during bring up. And my
> > cluster is only four nodes in size. Of course, when the system is at
> steady
> > state (i.e. all nodes up and running), it's awesome. However, when I have
> to
> > go through a crash / recovery cycle, I usually want to shoot myself....
> >
> > Anyone got any war stories to share? Any papers or presentations that I
> > should look at?
> >
> > Thanks muchly,
> >
> > --b
>
>
>