[erlang-questions] Network partition scenario

Fri Sep 19 06:34:35 CEST 2008

JasonGanetsky <jason.ganetsky@REDACTED> wrote:

jg> I will handle this, basically, by
jg> shutting down the application on both nodes, clearing mnesia (which
jg> is acceptable in this case), restarting mnesia, and then restarting
jg> my application.

Out of curiousity ... what does "clearing mnesia" mean?  Starting from
scratch, deleting all Mnesia data?  Or something else?

jg> I will not use mnesia:set_master_nodes(), as it
jg> apparently causes the inconsistent_database message to be
jg> suppressed.

While the network partition was in effect, transactions on both sides
may have done globally-inconsistent things ... but one won't know that
until the partition is healed.

jg> My question is: how do I get them to reconnect? Should I do this by
jg> simpling calling net_adm:ping() on the other node regularly? Or is
jg> there a better way?  Also, am I correct in assuming that restarting
jg> mnesia will cause them to re-sync?

You'll need some excuse for one to communicate with the other.  If
you're using default value of "-kernel dist_auto_connect" (not "once" or
"false", see net_kernel(3)), net_adm:ping() is good enough.

Upon restarting, the local Mnesia instance will need to contact other
transaction managers to calculate the fate of any unresolved
transactions.  That need will trigger re-connecting if dist_auto_connect
is true.

-Scott