Basic Mnesia Distribution Questions

Tue Jan 2 12:16:48 CET 2001

On Thu, 28 Dec 2000, Sean Hinde wrote:

>I'm not aware that this problem has been solved for any available
>replicated database system like mnesia but if anyone knows better
>maybe they can share..
>
>Solving this properly in the general case is utterly mind bending,
>but I have wondered whether one could come up with some scheme which
>logged updates/deletes locally and attempted to merge these by
>calling a user defined callback for each explicitly defined
>transaction/update type.. Thoughts?

I believe this is considered a pathological problem, i.e. there is 
no general solution -- only application-specific ones.

An example:

Basically all networked PIMs (Personal Information Managers) have 
to address this problem somehow, since we do not yet live in the
"always connected" world. If you have your agenda available on the
LAN, but want to be able to modify it while on the road, the PIM
will have to resynchronize the appointments. In the ones I've tried,
this procedure is semi-automatic at best: if an appointment has been 
modified both in the network copy and on your laptop, the PIM will
usually ask which one you want to keep; in all other cases,
synchronization can automatic.

Obviously, this does not work in an unattended embedded system.
Here, the dilemma is that you have to automatically restore the
system, but this almost always means possibly losing some updates.
A common method is to look at the roles of each node (basically, what
applications are running on each), and making some sort of guess at
which node has the most "interesting" copy of the database.

An additional problem is that the situation is easily compounded as
the nodes re-connect, so while some control component is trying to
figure out which node(s) to restart, new (possibly irreparable)
inconsistencies may appear. One solution to this problem is to forbid
nodes to re-connect, and resolve the inconsistency off-line. This can
be done with the kernel environment variable 'dist_auto_connect' (in
OTP R7B), which allows you to turn off the auto-connect characteristic
that is default in distributed Erlang.

Example:

  erl -kernel dist_auto_connect once

will allow erlang nodes to connect automatically _the first time_,
which is normally when an erlang node starts, and pings an running
node. If the connection is lost and then restored, the nodes will not
automatically re-connect. Using a backdoor, e.g. a UDP ping, a control
component may detect the situation and decide what to do without
risking further damage to the system.

/Uffe
-- 
Ulf Wiger                                    tfn: +46  8 719 81 95
Senior System Architect                      mob: +46 70 519 81 95
Strategic Product & System Management    ATM Multiservice Networks
Data Backbone & Optical Services Division      Ericsson Telecom AB