understanding the scaleability limits of erlang and mnesia

Tue Jan 26 20:04:47 CET 2010

Hi Guys,

I'm fairly new to erlang and I'm trying to understand better how erlang and
mnesia deal with large scale. I'm wondering if anyone could provide some
examples where they have been using erlang in a very large configuration
(i.e. more than 10 machines  / more than 100 machines). I specifically am
interested where people are running in a clustered configuration with an
mnesia backing store to their application.

It's been my experience that as much as a technology claims to be scalable,
operability issues usually surface that make it bad in practice to simply
just add more machines to the cluster. As an example, in my current
configuration, I am experiencing a 10 minute mnesia recovery / verification
time during node startup. If I try to bring up two nodes at the same time, I
see even longer times and sometimes even failure during bring up. And my
cluster is only four nodes in size. Of course, when the system is at steady
state (i.e. all nodes up and running), it's awesome. However, when I have to
go through a crash / recovery cycle, I usually want to shoot myself....

Anyone got any war stories to share? Any papers or presentations that I
should look at?

Thanks muchly,

--b