[erlang-questions] Network partition scenario

Thu Sep 11 21:31:10 CEST 2008

I have 2 nodes, each running an instance of the same application, with
shared, distributed mnesia tables. This is for redundancy. If one node
has a ton of bricks fall on it, the show goes on. Everything works
well if one node dies. Scenario: because of swapping, Node 2
temporarily stops responding. Node 1 decides that Node 2 has gone
down. Now, swapping stops. Node 2 is now responding, and serving
requests... but nodes() == [] on both. Both nodes are now serving
requests with no synchronization in mnesia.   My plan is to handle the
mnesia event message {inconsistent_database,
running_partitioned_network, _}. I will handle this, basically, by
shutting down the application on both nodes, clearing mnesia (which is
acceptable in this case), restarting mnesia, and then restarting my
application. I will not use mnesia:set_master_nodes(), as it
apparently causes the inconsistent_database message to be
suppressed.   However, this inconsistent_database message is only
generated when mnesia_monitor receives a nodeup message. That means
that, after the two nodes have severed their connection, the partition
is not detected until the two nodes reconnect.   My question is: how
do I get them to reconnect? Should I do this by simpling calling
net_adm:ping() on the other node regularly? Or is there a better way?
Also, am I correct in assuming that restarting mnesia will cause them
to re-sync? My testing seems to point to use.