[erlang-questions] Detecting 'inconsistent_database' and other Mnesia events

Ulf Wiger ulf@REDACTED
Thu Jan 11 16:05:36 CET 2007


Den 2007-01-11 07:36:30 skrev Scott Lystig Fritchie  
<fritchie@REDACTED>:

> Good evening.  As I've been refactoring a Mnesia-based application, it
> struck me that mnesia:subscribe/1 cannot be called until Mnesia has
> been started.
>
> As a Mnesia event subscriber, if I want extremely *prompt* and
> *reliable* notification of important events such as the
> 'inconsistent_database' warning ... hm, it seems I'm stuck with a race
> condition, possibly missing the event between starting Mnesia and
> becoming an event subscriber.

We approached it differently. There are other things to worry
about besides inconsistencies. Sometimes, the database may
not come up at all. A simple example of what can cause this:

1) node A goes down, node B logs this
2) node B crashes and burns, never to recover
3) node A tries to start, finds B missing

At this point A concludes that it was not alive when B
died, and thus B is likely to have more current data.
A then decides to wait for B...

In a non-stop system, it is arguably better to restart
with the best data available than to sit idly for hours
waiting for a repair man to come out and replace a
busted disk (or whatever).

We chose to run with -kernel dist_auto_connect once and a
backdoor ping, in order to preempt the situations that can
cause mnesia inconsistencies (and other mayhem). Then we run
an application _before_ mnesia which handshakes with its peers
on other mnesia nodes, and performs a dependency analysis
on the loaders (the info returned from wait_for_tables).
If it finds that the loaders are deadlocked, or if nodes
are missing (e.g. due to the situation described above)
a force_load_tables() will eventually be triggered.

The same application also sets mnesia master nodes in order
to recover from partitioned network.

The problem with this is obviously that it's not universally
applicable. What to as a result of loader deadlock or
partitioned network is decidedly application-specific.
It's possible that a generic decision support framework
could be made out of what we've done, but you shouldn't
hold your breath waiting for us to do that (you know - if
it works, don't touch it, and all that.)

Personally, I would like to see an add-on to mnesia
offer this type of functionality. Whenever Dan gets loads
of free time on his hands perhaps...?

BR,
Ulf W
-- 
Ulf Wiger



More information about the erlang-questions mailing list