[erlang-questions] RFC: mnesia majority checking
Ulf Wiger
ulf.wiger@REDACTED
Fri Dec 10 09:08:24 CET 2010
On 10 Dec 2010, at 03:44, Alain O'Dea wrote:
> The guys at Basho probably have some excellent material on partition
> tolerance related to Riak. It is worth a chat with Justin Sheehy or Andy Gross to see what insight they have.
You're absolutely right, and I know most of the Basho guys are
on this list, and I'm never one to pass up an opportunity to chat with
them. I'm grateful to Dizzy (Dave Smith) for being a sounding
board helping me sort out some thoughts about how to handle
read transactions. Also, Uwe Dauernheim at Klarna, who has been
doing thesis work with Scalaris, has taken part in many helpful
discussions. My colleagues Tino Breddin and Hans Nilsson are also
excellent discussion partners.
The thing that is special about mnesia is its insistence on transaction
consistency. It is not an Eventual Consistency system - it's ACID, and
am not looking to subvert its consistency properties, but rather strengthen
them. This includes consistency across tables.
On the CAP scale, this change would amount to allowing you to
sacrifice Availability for the sake of Consistency and Partition tolerance.
An example of where this might be useful is when multiple agents are
drawing from a global resource pool, and you'd rather deny service
than consume the same resource twice.
In order to apply quorum logic and fencing, it is important to make
mnesia aware of it, so that you can make it respect vital preconditions.
One thing I'd especially like feedback on is if the 'majority' flag is a
reasonable (and sufficiently powerful) extension.
BR,
Ulf W
>
> On Thu, Dec 9, 2010 at 1:55 PM, Ulf Wiger <ulf.wiger@REDACTED> wrote:
>
> I added majority checking in the mnesia_locker as well.
> The main reason for doing so (except aborting earlier),
> was to enable majority checking on reads.
>
> The way it works now is that majority checking is done on
> reads that use a write lock (e.g. mnesia:wread/1).
> A normal read, with a read lock, will succeed even in a
> minority. This is probably a pretty good thing.
>
> https://github.com/uwiger/otp/commit/650f8e30d205bc1130f37c819f920f901358b937
>
> Comments still most welcome. Monologues are fun too, but
> I can follow Dan North's advice and get a rubber duck for that.
>
> If you are unsure whether this is at all needed, please chime in.
> It's is most definitely not a stupid question.
>
> BR,
> Ulf W
>
> On 9 Dec 2010, at 15:26, Ulf Wiger wrote:
>
> >
> > git fetch git://github.com/uwiger/otp mnesia-majority
> >
> > https://github.com/uwiger/otp/commit/d97ae7d4329d9342e576f3cdd893de6865449e42
> >
> > This is a first stab at a function that I believe could be useful in
> > high-availability applications using mnesia.
> >
> > At this stage, I'd love to have some comments, and suggestions,
> > if someone thinks of a better way to do it.
> >
> > From the commit message:
> >
> > "Add {majority, boolean()} per-table option.
> >
> > With {majority, true} set for a table, write transactions will
> > abort if they cannot commit to a majority of the nodes that
> > have a copy of the table. Currently, the implementation hooks
> > into the prepare_commit, and forces an asymmetric transaction
> > if the commit set affects any table with the majority flag set.
> > In the commit itself, the transaction will abort if it cannot
> > satisfy the majority requirement for all tables involved in the
> > thransaction.
> >
> > A future optimization might be to abort already when a write
> > lock is attempted on such a table (/-object) and the lock cannot
> > be set on enough nodes.
> >
> > This functionality introduces the possibility to automatically
> > "fence off" a table in the presence of failures.
> >
> > This is a first implementation. Only basic tests have been
> > performed."
> >
> > One particular use of this functionality would be to have a "global
> > resource pool" in one table with {majority, true}, and periodically
> > check out resources into a local buffer. If there is a failure condition,
> > you can use the local buffer, but not check out more resources, unless
> > you happen to still be in contact with more than half of the replicas.
> >
> > This should allow for a well-defined merge after a network split.
> >
> > BR,
> > Ulf W
> >
> > Ulf Wiger, CTO, Erlang Solutions, Ltd.
> > http://erlang-solutions.com
> >
> >
> >
>
> Ulf Wiger, CTO, Erlang Solutions, Ltd.
> http://erlang-solutions.com
>
>
>
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>
Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com
More information about the erlang-questions
mailing list