[erlang-questions] RFC: mnesia majority checking

Ulf Wiger ulf.wiger@REDACTED
Fri Dec 10 09:08:24 CET 2010


On 10 Dec 2010, at 03:44, Alain O'Dea wrote:

> The guys at Basho probably have some excellent material on partition
> tolerance related to Riak.  It is worth a chat with Justin Sheehy or Andy Gross to see what insight they have.

You're absolutely right, and I know most of the Basho guys are 
on this list, and I'm never one to pass up an opportunity to chat with
them. I'm grateful to Dizzy (Dave Smith) for being a sounding 
board helping me sort out some thoughts about how to handle 
read transactions. Also, Uwe Dauernheim at Klarna, who has been 
doing thesis work with Scalaris, has taken part in many helpful 
discussions. My colleagues Tino Breddin and Hans Nilsson are also
excellent discussion partners.

The thing that is special about mnesia is its insistence on transaction
consistency. It is not an Eventual Consistency system - it's ACID, and
am not looking to subvert its consistency properties, but rather strengthen
them. This includes consistency across tables.

On the CAP scale, this change would amount to allowing you to 
sacrifice Availability for the sake of Consistency and Partition tolerance.

An example of where this might be useful is when multiple agents are 
drawing from a global resource pool, and you'd rather deny service 
than consume the same resource twice.

In order to apply quorum logic and fencing, it is important to make 
mnesia aware of it, so that you can make it respect vital preconditions.

One thing I'd especially like feedback on is if the 'majority' flag is a 
reasonable (and sufficiently powerful) extension.

BR,
Ulf W

> 
> On Thu, Dec 9, 2010 at 1:55 PM, Ulf Wiger <ulf.wiger@REDACTED> wrote:
> 
> I added majority checking in the mnesia_locker as well.
> The main reason for doing so (except aborting earlier),
> was to enable majority checking on reads.
> 
> The way it works now is that majority checking is done on
> reads that use a write lock (e.g. mnesia:wread/1).
> A normal read, with a read lock, will succeed even in a
> minority. This is probably a pretty good thing.
> 
> https://github.com/uwiger/otp/commit/650f8e30d205bc1130f37c819f920f901358b937
> 
> Comments still most welcome. Monologues are fun too, but
> I can follow Dan North's advice and get a rubber duck for that.
> 
> If you are unsure whether this is at all needed, please chime in.
> It's is most definitely not a stupid question.
> 
> BR,
> Ulf W
> 
> On 9 Dec 2010, at 15:26, Ulf Wiger wrote:
> 
> >
> > git fetch git://github.com/uwiger/otp mnesia-majority
> >
> > https://github.com/uwiger/otp/commit/d97ae7d4329d9342e576f3cdd893de6865449e42
> >
> > This is a first stab at a function that I believe could be useful in
> > high-availability applications using mnesia.
> >
> > At this stage, I'd love to have some comments, and suggestions,
> > if someone thinks of a better way to do it.
> >
> > From the commit message:
> >
> > "Add {majority, boolean()} per-table option.
> >
> > With {majority, true} set for a table, write transactions will
> > abort if they cannot commit to a majority of the nodes that
> > have a copy of the table. Currently, the implementation hooks
> > into the prepare_commit, and forces an asymmetric transaction
> > if the commit set affects any table with the majority flag set.
> > In the commit itself, the transaction will abort if it cannot
> > satisfy the majority requirement for all tables involved in the
> > thransaction.
> >
> > A future optimization might be to abort already when a write
> > lock is attempted on such a table (/-object) and the lock cannot
> > be set on enough nodes.
> >
> > This functionality introduces the possibility to automatically
> > "fence off" a table in the presence of failures.
> >
> > This is a first implementation. Only basic tests have been
> > performed."
> >
> > One particular use of this functionality would be to have  a "global
> > resource pool" in one table with {majority, true}, and periodically
> > check out resources into a local buffer. If there is a failure condition,
> > you can use the local buffer, but not check out more resources, unless
> > you happen to still be in contact with more than half of the replicas.
> >
> > This should allow for a well-defined merge after a network split.
> >
> > BR,
> > Ulf W
> >
> > Ulf Wiger, CTO, Erlang Solutions, Ltd.
> > http://erlang-solutions.com
> >
> >
> >
> 
> Ulf Wiger, CTO, Erlang Solutions, Ltd.
> http://erlang-solutions.com
> 
> 
> 
> 
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
> 
> 

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com





More information about the erlang-questions mailing list