[erlang-questions] mnesia recover from netsplit, can't delete node from schema

Fri Jun 29 16:41:02 CEST 2012

On Jun 28, 2012, at 2:40 PM, Daniel Dormont wrote:

> Here is the scenario that happened to me as best I can tell. I had two nodes in a cluster, let's call them A and B. B became unavailable for a while and got rebooted. When I tried to start it again, things seem to work except that certain tables seem not to exist any more. As far as I can tell, these tables used to be enabled only on B and not A, and are now in some sort of weird hybrid unavailable state.
> 
> A is still running fine in production even with these tables missing, but I can't seem to get a clean start of my application (Ejabberd) on B. So what I figured I would do would be just start a fresh node on B, start Mnesia, add extra_db_nodes pointing to A and go from there. But the problem is A still thinks these certain tables exist only on B (they are listed as remote on A). Fortunately, Ejabberd is smart enough to create any tables it needs on startup, so I was thinking a clean start on B would do this. So I went into A and ran
> 
> mnesia:del_table_copy(schema, B).
> 
> thinking this would make the remote tables sort of go away. But instead it fails with
> 
> {aborted,{no_exists,vcard_search}}
> 
> And trying to delete the table directly yields the same result.
> 
> Is there a way I can force Mnesia on A to completely forget about a set of remote tables (and, for that matter, the node that was supposed to store them) before I bring a new node online?

You might want to take a look at the documentation for mnesia:set_master_nodes/1,2 and maybe mnesia:force_load_table/1.

Just make sure you understand exactly how these work before using either in production--if used incorrectly, you could leave the database in an inconsistent state.

Hope that helps,

-Rick