Joshua Muzaaya joshmuza@REDACTED
Mon Oct 10 14:08:38 CEST 2011

i still have a number of challenges as regards mnesia fragmentation and
replication. Consider the following scenario (The question am asking is
based on what follows below):

You have a data driven enterprise application which should be highly available
within the enterprise. If the internal information source is down for
any reason,
 the enterprise applications must switch to fetch data from a *recovery center*
which is offsite (remote).

You decide to have the database replicated onto Two Nodes within the enterprise
(refered to as *DB side A* and *DB side B*). These two are running on separate
 Hardware but linked together with say, a *Fast Ethernet or Optical
Fibre link*.
Logically, you create some kind of tunnel or secure communications
between these
two Mnesia DBs. The two (A and B) should have the same replica of data and are
 in sync all the time.

Now, meanwhile, the recovery center must too, have the same copy of data and in
sync all the time just in case the local data access is cutoff due to an attack
or hardware failure. So the same Database schema must be replicated
across the 3
sites (*Side A* , *Side B* and *recovery center*).

Now, within the enterprise, the application middle ware is capable of
switching data requests amongst the database sites. If A is down, then
without the application realizing it, the request is re-routed to Database B
and so on. The middle ware layer can be configured to do load balancing
(request multiplexing) or to do be flexible with fail over techniques.

*Further Analysis*:

At Database/Schema creation time, all involved Nodes must be up and running
Mnesia. To achieve this, you create say: *'db_side_A@REDACTED'*,
*'db_side_B@REDACTED'* and finally, *'db_recovery_center@REDACTED'*

Now, at Table creation, you would want to have your mnesia tables
fragmented. So you decide on the following parameters:

*n_disc_only_copies =:= number of nodes involved in the pool =:= 3
**Reason:* You are following the documentation that this parameter
regulates how
many disc_only_copies replicas that each fragment should have.
So you want each table to have each of its fragments on each mnesia Node.
*node_pool =:= all nodes involved =:= ['db_side_A@REDACTED',
                                     'db_recovery_center@REDACTED'] *

All your tables are then created based on the following arrangement

Nodes = [
    No_of_fragments = 16,
    {atomic,ok} = mnesia:create_table(*TABLE_NAME*,[

NOTE: In the syntax above, RECORD_NAME_HERE cannot be a variable in reality
since records must be known at compile time with Erlang. From the
installation, you see that for each table, every fragment, say,
table_name_frag2, appears on every Node's file system.

*Challenges and arising Questions*:
After following what is listed down above, your first database start is okay
since mnesia is running on all nodes. Several challenges start to show up as
the application runs and am listing the below:

*1. * Supposing, you decide that all writes are first tried on DB Side A and
if side A at that instant is unavailable, the call is re-tried on DB
Side Band so on to recovery
center, and if the call fails to return on all the 3 database nodes, then
the application network middle ware layer reports back that the database
servers are all unavailable (this decision could have been influenced by the
fact that if you let applications randomly write to your mnesia replicas,
its very possible to have inconsistent database errors showing up in case
your mnesia nodes lose a network connection with each other yet writes are
being committed on each by different Erlang applications. If you decide on
having master_nodes, then you could be at risk of loosing data). So by
behavior, you are forcing DB Side A to be the master. This makes the other
Database Nodes Idle for all the time as long as DB Side A is up and running
and so as many requests as hit side A and it does not go down, No request
will hit side B and recovery center at all.

*2. * Mnesia on start, normally, should see all involved nodes running
(mnesia must be running on all involved nodes) so that it can do its
negotiations and consistency checks. It means that if mnesia goes down on
all nodes, mnesia must be started on all nodes before it can fully
initialize and load tables. Its even worse if the Erlang VM dies along with
Mnesia on a remote site. Well, several tweaks and scripts here and there
could help restart the entire VM plus the intended applications if it goes
To cut the long story short, let me go to the questions.

*1. * What would a Database administrator do if mnesia generates
events of inconsistent_database,
starting to run database behind a partitioned network, in a situation where
setting a mnesia master node is not desirable (for fear of data loss) ?
*2. * What is the consequence of the mnesia event inconsistent_database,
starting to run database behind a partitioned network as regards my
application ? What if i do not react to this event and let things continue
the way they are ? am i loosing data ?
*3. * In large mnesia clusters, what can one do if Mnesia goes down together
with the Erlang VM on a remote site ? are there any known good methods of
automatically handling this situation ?
*4. * There times when one or two nodes are unreachable due to network
problems or failures, and mnesia on the surviving Node reports that a given
file does not exist especially in cases where you have indexes. So at run
time, what would be the behavior of my application if some replicas go down.
Would you advise me to have a master node within a mnesia cluster ?

As you answer the questions above, you could also highlight on the layout
described at the beginning, whether or not it would ensure availability. You
can give your personal experiences on working with mnesia fragmented and
replicated Databases in production. In reference to the linked (quoted)
question at the very beginning of this text, do provide alternative settings
that could offer more reliability at database creation, say in terms of the
number of fragments, Operating system dependencies, node pool size, table
copy types e.t.c. Thanks guys

*Muzaaya Joshua
Systems Engineer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111010/b1af29b3/attachment.htm>

More information about the erlang-questions mailing list