[erlang-questions] Network partition scenario

Jason Ganetsky <>
Fri Sep 19 17:33:26 CEST 2008


On Fri, Sep 19, 2008 at 12:34 AM, Scott Lystig Fritchie <
> wrote:

> JasonGanetsky <> wrote:
>
> jg> I will handle this, basically, by
> jg> shutting down the application on both nodes, clearing mnesia (which
> jg> is acceptable in this case), restarting mnesia, and then restarting
> jg> my application.
>
> Out of curiousity ... what does "clearing mnesia" mean?  Starting from
> scratch, deleting all Mnesia data?  Or something else?


Deleting all Mnesia data. They way I'm using Mnesia for my application, it
works... (it's to be avoided, but it's acceptable in the infrequent case of
partition).


>
>
> jg> I will not use mnesia:set_master_nodes(), as it
> jg> apparently causes the inconsistent_database message to be
> jg> suppressed.
>
> While the network partition was in effect, transactions on both sides
> may have done globally-inconsistent things ... but one won't know that
> until the partition is healed.


My question was not about the consequences of network partitions, but about
how Mnesia detects and notifies.

Mnesia listens to nodeup messages. Any time it receives one, it performs an
"inconsistency" check, by querying the new node for whether it had itself
marked the queryer as mnesia_down. If this is the case, a
running_partitioned_network message is generated. Transactions do not factor
in to the inconsistency check at all.

Whenever a new node joins the ring, all nodes receive a nodeup message, so
potentially, all nodes generate the running_partitioned_network message.
Having a non-empty master_nodes list causes a node to always respond with
"you are not mnesia_down", when queried. However, it still generates
running_partitioned_network messages locally (as expected, since the other
nodes may respond with "yes, you've been mnesia_down").

I discovered this by reading the Mnesia code, and by studying its behavior.

I've decided to have empty master_nodes lists on all my nodes, because I
want all nodes notified in the event of a network parition.


>
>
> jg> My question is: how do I get them to reconnect? Should I do this by
> jg> simpling calling net_adm:ping() on the other node regularly? Or is
> jg> there a better way?  Also, am I correct in assuming that restarting
> jg> mnesia will cause them to re-sync?
>
> You'll need some excuse for one to communicate with the other.  If
> you're using default value of "-kernel dist_auto_connect" (not "once" or
> "false", see net_kernel(3)), net_adm:ping() is good enough.
>
> Upon restarting, the local Mnesia instance will need to contact other
> transaction managers to calculate the fate of any unresolved
> transactions.  That need will trigger re-connecting if dist_auto_connect
> is true.
>

Does do anything with transactions? As far as I can tell, it simply dumps
its replica and goes with whatever the other node has.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080919/3601657e/attachment.html>


More information about the erlang-questions mailing list