[erlang-questions] Call for Contributions: Mnesia best practices

Wed Jul 2 17:58:50 CEST 2008

Now, this is not exactly mnesia, but google application engine's
datastore which is based off google bigtable. HOWever, they do present
some problems, their causes, and strategies for handling scalability
when you have a distributed database..

http://sites.google.com/site/io/building-scalable-web-applications-with-google-app-engine

The other google io presentatiosn over at
http://sites.google.com/site/io/ on datastore are also worth to see.

Also a comment:

> * How to partition data between subsystems, without losing the illusion
> they're all one big happy system.

They are not one big happy system. The illusion must be forgotten and
reality must be faced.
Things like: Stop doing joins. Instead begin to duplicate data, so it
is available directly on first access.
Or: Send the code to execute where the data is, instead of sending the
data to the machine that has the code.

Look at the hoops they go through to implement efficient
statistics-counters in the video.  ACID properties are a costly
luxury, now you have to start conserve your use of it, find when
almost or eventual consistency is enough and use that fact.

Yes, database programming just got trickier, but if your
write-transactions takes 10ms and must wait in a single line, then you
can only do 100 of them per second. If that is orders of magnitude
less than you need, then it is time to hack around it.

Also a word of caution: These strategies are for enormous scalability.
You only need them if you already know what problems you're facing
with your current overstrained rdbm solution. It takes time to
implement these hacks for distributed databases, because the hacks use
application specific knowledge that only you can know, because it is
your application.