[erlang-questions] MNESIA FRAGMENTATION AND REPLICATION CHALLENGES

Mon Oct 10 15:03:01 CEST 2011

Hi Joshua,

I am not an expert in Mnesia, but I can try to give you some suggestions
(for sure you might have guessed them by now, but at least you can have a
confirmation):

Q1: "What would a Database administrator do if mnesia generates events
of inconsistent_database,
starting to run database behind a partitioned network, in a situation where
setting amnesia master node is not desirable (for fear of data loss) ?"
A1: Check if the nodes are up and try to update manually the DB from the
node which was working the longest (providing the DB was correct written).

Q2: "What is the consequence of the mnesia event inconsistent_database,
starting to run database behind a partitioned network as regards my
application ? What if i do not react to this event and let things continue
the way they are ? am i loosing data ?"
A2: The state is coming from all the nodes failure, that means you can no
longer read/write to/from DB. That means, yes, you lose data.

Q3: "In large mnesia clusters, what can one do if Mnesia goes down together
with the Erlang VM on a remote site ? are there any known good methods of
automatically handling this situation ?"
A3: http://www.erlang.org/doc/apps/mnesia/Mnesia_chap7.html#id311335 If that
doesn't work, build inside your middleware a distributed application which
should monitor Erlang nodes. If you are still not satisfied, create a
watchdog.

Q4: "There times when one or two nodes are unreachable due to network
problems or failures, and mnesia on the surviving Node reports that a given
file does not exist especially in cases where you haveindexes. So at run
time, what would be the behavior of my application if some replicas go down.
Would you advise me to have a master node within a mnesia cluster ? "
A4: If some replicas go down and your middleware knows how to handle the
connection failure, your application should not receive any error. Your
application should receive an error only when the BD is sliced or the
working node didn't have time to make a replica of the new entry (entries).

Few more considerations here which may work in all the cases when you don't
trust the way a DB handles your data over the network:
1. Replace the replication system with your own data distribution. E.g.,
instead of using data replication, use your own system to send the data to
all nodes. Further on, when a connection fails, broadcast a message to all
similar nodes and, following a certain algorithm to pick up another node,
switch to another node, trying to revive and update the dead node meanwhile.
2. Replace the DB error reporting with a watchdog. E.g., continuing the idea
above, put a bunch of functions to check periodically the nodes and the
applications inside he nodes. You can add to it a probabilistic method to
decide which node is longer up and to consider that as the most stable node,
so, the read/write to be directed with preference to that node.

One more thing. If you don't like Mnesia replication and how it handles the
data over the network, try CouchDB which doesn't have master-slave
replication, but directional replication. It's written in Erlang as well,
but it's no-SQL and it recovers faster from failures. I don't say Mnesia
it's not good (well, it's great actually), but following your "fears", I
just suggest an alternative which may suits you better.

I hope this mail will help you in choosing correct setup for your network
distributed DB.

Cheers,
CGS

On Sun, Oct 9, 2011 at 3:32 PM, Joshua Muzaaya <joshmuza@REDACTED> wrote:

>
> Hi guys, i have asked a question here:
> http://stackoverflow.com/questions/7703170/mnesia-fragmentation-and-replication-resultant-availability-and-reliabilityand need solutions from the forum. I could not edit and reformat sionce it
> would be time wasting so i deceide to paste you the link above. You can
> follow the link and provide your solutions. Thanks guys
> --
> *Muzaaya Joshua
> Systems Engineer
> +256774115170*
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111010/03ff7156/attachment.htm>