I actually came to the same point as you did, a few months ago, and had asked many people many questions. I also felt there was a lack of discussion.<br><br>Our partitions were actually not caused by the network, but caused by a
lack of a responsiveness of one node. It had too little memory and
would end up swapping... while it was swapping, the second node
declared the first dead... the first node would come back with its
connection to the second broken, declaring the second dead.<br><br>To solve the problem, we needed automatic healing. Essentially, you will have to come up with some mechanism to reconcile the differences between partitioned databases. Fortunately, in my application, there are ways to sensibly reconstruct the data from an external MySQL source. Also, you are going to have to minimize the externalities that may cause a partition. This means running both nodes on the same switch and making sure they are highly available (possibly dedicating the machine to mnesia). <br>
<br>However, I ended up needing to read the Mnesia source to understand how it detects partitions, and how it subsequently behaves. Using set_master_nodes() has a number of undesirable traits... like it supresses the running_partitioned_network message that is used to detect partitions. I set it up so that both nodes in my pair would watch for partitions, both would discard data with irreconcilable differences, and both would restart Mnesia. For more details, feel free to e-mail me.<br>
<br>-Jason<br><br><div class="gmail_quote">On Sun, Nov 2, 2008 at 2:55 PM, Ulf Wiger <span dir="ltr"><<a href="mailto:ulf@wiger.net">ulf@wiger.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
AFAIK, no general algorithm exists for self-healing after network<br>
splits. MySQL Cluster (NDB) e.g. solves it by requiring at least<br>
3 copies of the data, and one arbitrator. In the case of a network<br>
split, you may continue if you can speak to the arbitrator; otherwise<br>
you're shut down.<br>
<br>
Mnesia provides the tools for resolving the situation, and one way<br>
to protect yourself from accidental inconsistencies is to use<br>
net_kernel dist_auto_connect_once, and keep a back door between<br>
the nodes (this has been discussed several times on this list.)<br>
Once you've determined that you have a split network, and which<br>
copies you want to continue with, you can restart the other nodes,<br>
possibly using mnesia:set_master_nodes/1 to make absolutely<br>
sure that they load their data from the right nodes.<br>
<br>
Setting this up is not terribly difficult. Interfacing to another DBMS<br>
is likely to be much more work, and you'd have to make really sure<br>
that they have a better strategy for coping with network splits than<br>
mnesia - I'm not at all sure that they do (but I'm willing to repent in<br>
the face of hard evidence).<br>
<br>
The lack of automatic handling of network splits has been mentioned<br>
a number of times as an argument against mnesia, but I really don't<br>
recall hearing much about how other DBMSs deal with it. There seems<br>
to be an assumption that since there isn't much discussion about<br>
network splits for other DBMSs, they must simply solve it transparently.<br>
I think this is a dangerous conclusion.<br>
<br>
BR,<br>
Ulf W<br>
<br>
2008/11/2 Joel Reymont <<a href="mailto:joelr1@gmail.com">joelr1@gmail.com</a>>:<br>
<div><div></div><div class="Wj3C7c">> I'm looking to launch a poker 'social network', the first and only one<br>
> where you can actually play poker. I'm hesitant to go full-way with<br>
> Mnesia, though, and wonder how others are handling this.<br>
><br>
> I googled and poked around but there seems to be an elephant in the<br>
> room and no one is talking about it. The elephant is that Mnesia does<br>
> not self-heal after network splits.<br>
><br>
> Could it be that this is a solved problem or has anyone avoided it<br>
> because their data model does not require self-healing? How do big<br>
> projects deal with it? Ericsson?<br>
><br>
> I would like to run a few Mnesia nodes for high availability but it<br>
> positively don't want my databases to diverge and I don't want to deal<br>
> with reconciling the databases later.<br>
><br>
> Strictly speaking, I could keep mnesia as a transient data store and<br>
> keep my master database in a non-Erlang database. I just thought I'd<br>
> poll the community regardless.<br>
><br>
> Thanks, Joel<br>
><br>
> --<br>
> <a href="http://wagerlabs.com" target="_blank">wagerlabs.com</a><br>
><br>
> _______________________________________________<br>
> erlang-questions mailing list<br>
> <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
> <a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>
><br>
_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>
</div></div></blockquote></div><br>