[erlang-questions] Removing mnesia cluster node while mnesia is offline

Fri Jul 6 10:49:29 CEST 2012

Hi Francesco,

> OK, let's say that I have node A, B, and C, in a cluster. A goes down,
> and C gets removed from the cluster. So B now has [A, B] in the schema
> as cluster nodes. A's data still has [A, B, C] as cluster nodes, and
> if we start A again, that will be propagated to B. That's what I meant
> with "pollute".

This scenario is impossible. I will rewrite it as follows, depending on your definition of "remove". One correct interpretation would be:

1. A,B & C are in the cluster
2. A goes down
3. C is turned off and the schema is destroyed on C (instead of your term "removed")
4. So B has [A,B,C] in the schema (and so does A)
5. A is started again.
6. A copies the data from B because it knows it was down and B has the latest data 

Another correct sequence would be:

1. A,B & C are in the cluster
2. A goes down
3. Someone tries to remove C from the cluster via API call. Mnesia will refuse to do this, because A is down.

> A's data still has [A, B, C] as cluster nodes, and
> if we start A again, that will be propagated to B. 

This is the bit that happen unless you had partitioning/netsplit. A knows that B was online when it went down. B knows that A went down. When A starts up again, they communicate and figure this out. Then A replaces all its copies of non-local/shared data from the current data at B.

Rudolph van Graan

Please have a look at my blogs:

- Random Views of London -- A photographer's interpretation of London
  http://randomlondonview.blogspot.co.uk/

On 3 Jul 2012, at 13:12, Francesco Mazzoli wrote:

> Hi Rudolph,
> At Tue, 03 Jul 2012 12:31:11 +0100,
> Rudolph van Graan wrote:
>> You have to set things up so that you have other replicas of those
>> tables on other nodes.
> 
> Well, some data is inherently "local" and not relevant to the other
> nodes, that have similar local tables.
> 
>> This is not how mnesia works. The list of nodes in the distributed
>> database is static, i.e. all the nodes have the same list. This is
>> why all of them needs to be online when you add or remove nodes.
> 
> I understand this, but I still want to know if it's possible to
> recover from that situation more or less gracefully.
> 
>> The only problem that you need to solve is dealing with partitioning
>> (i.e. splits) and you have two sets of data on two different
>> segments. It is not possible for a node to "pollute" other nodes.
> 
> OK, let's say that I have node A, B, and C, in a cluster. A goes down,
> and C gets removed from the cluster. So B now has [A, B] in the schema
> as cluster nodes. A's data still has [A, B, C] as cluster nodes, and
> if we start A again, that will be propagated to B. That's what I meant
> with "pollute".
> 
>> So "... offline node still thinks it is clustered..." - it doesn't
>> just think so, it has been configured so when you added it into the
>> cluster. It will stay part of the cluster until you remove it.
> 
> ...but to remove it I have to bring it online, and if the other online
> nodes have a different configuration, things get ugly.
> 
> --
> Francesco * Often in error, never in doubt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120706/9ca8e080/attachment.htm>