Basic Mnesia Distribution Questions

Thu Dec 28 12:46:11 CET 2000

Brad,

> Thanks! Once this has occurred, though, I still find that the 
> nodes are
> basically acting as if they were independent. I receive the
> inconsistent_database error message after the ping, but mnesia:info on
> both nodes indicates that the other node is stopped, and no 
> updates are
> replicated. I have only been able to get mnesia working 
> properly again by
> restarting erlang on one of the nodes, but I assume it must 
> be possible to
> do it in a less disruptive way. I'm sure I'm just missing something.

Welcome to the world of partitioned networks.. The problem is that neither
node can know what updates have been carried out on the other node during
the link outage. Given arbitrarily complex transactions it is very hard to
imagine an algorithm which can work out what the combined resulting database
should look like.

Ericsson recommends exactly the procedure you followed - choose one of the
nodes to be a master and restart the other. Depending on your application
design this may or may not be disruptive (If you update and delete on both
nodes during the outage you will lose some data..)

Not many other solutions have even been proposed for this. There was one
suggestion to do a regular checkpoint type activity - I'm not sure what
happened to that. I also heard of someone who had an application which could
be started before mnesia to detect the partitioned network and put the
startup procedure on hold until the condition had gone away.

There is also a command mnesia:set_master_nodes/1,2 which may be used to
give more control over where table copies should be loaded from once the
partitioned node has been restarted

The eddie project (now on sourceforge.net) uses some clever scheme to
automatically set master nodes and restart the other node(s) (it did last
time I looked anyhow).

There is a section in the mnesia user guide which explains partitioned
networks quite well.

I'm not aware that this problem has been solved for any available replicated
database system like mnesia but if anyone knows better maybe they can
share..

Solving this properly in the general case is utterly mind bending, but I
have wondered whether one could come up with some scheme which logged
updates/deletes locally and attempted to merge these by calling a user
defined callback for each explicitly defined transaction/update type..
Thoughts?

> Thanks again,
> Brad
> 

- Sean

NOTICE AND DISCLAIMER:
This email (including attachments) is confidential.  If you have received
this email in error please notify the sender immediately and delete this
email from your system without copying or disseminating it or placing any
reliance upon its contents.  We cannot accept liability for any breaches of
confidence arising through use of email.  Any opinions expressed in this
email (including attachments) are those of the author and do not necessarily
reflect our opinions.  We will not accept responsibility for any commitments
made by our employees outside the scope of our business.  We do not warrant
the accuracy or completeness of such information.