[erlang-questions] mnesia dirty writes & race conditions
Fred Hebert
mononcqc@REDACTED
Mon May 4 17:30:42 CEST 2015
On 05/04, Jesper Louis Andersen wrote:
>But chances are you don't need linearizability for the operation in the
>first place. And then, you can avoid having to coordinate as you may be
>able to put yourself in AP instead. AP is typically much faster due to the
>lack of coordination, but do see the work of e.g., Neha Narula (et.al.) for
>counterexamples to this.
This is an interesting point made in "Highly Available Transactions:
Virtues and Limitations" by Peter Bailis et. al.
(http://www.bailis.org/papers/hat-vldb2014.pdf) (see section 3 on page
3):
> Accordingly, to increase concurrency, database systems offer a range
> of ACID properties weaker than serializability: the host of so-called
> weak isolation models describe varying restrictions on the space of
> schedules that are allowable by the system. None of these weak
> isolation models guarantees serializability, but, as we see below,
> their benefits are often considered to outweigh costs of possible
> consistency anomalies that might arise from their use.
Specifically, table 2 (http://i.imgur.com/7Lw9lBd.png) shows databases
such as MySQL, Postgres, and Oracle all possibly supporting
serializability, but by default would allow much lower guarantees
(repeatable reads or read committed), which are high-availability
transactions.
Repeatable Reads (RR) are defined as follows, which I believe is pretty
much what MVCC stands for:
> the ANSI standardized implementation-agnostic definition
> is achievable and directly captures the spirit of the term: if a
> transaction reads the same data more than once, it sees the same value
> each time (preventing “Fuzzy Read”).
Read Committed (RC) is defined as:
> Under Read Committed, transactions should not access uncommitted or
> intermediate versions of data items. This prohibits both “Dirty
> Writes” [...] and also “Dirty Reads”
And that's about it. This tells you multiple transactions could happen
at the same time and result in a non-linearizable history. Two RC
transactions could both operate at once, and through some interleavings
of read and write locks across transactions, give you results that would
not make sense without the specific concurrent interleaving they have
seen. They would not be linearizable or serializable.
An important note here is that a highly available transaction (HAT) is
defined as a transaction that eventually commits if it can contact at
least one replica for each of the data items it attempts to touch; This
is slightly different from the ususal "can I write to this row on any
given node", but does mean multiple levels of failure (even a majority
of them) could allow some transactions to still work under RR or RCs.
> As shown in Table 2, only three out of 18 databases provided
> serializability by default, and eight did not provide serializability
> as an option at all. [...] Given that these weak transactional models
> are frequently used, our inability to provide serializability in
> arbitrary HATs appears non-fatal for practical applications.
It is not explained outright, but I'm guessing the reason why many of
these transactions are *not* made highly-available via common RDBMs is
that they're more seen as optimizations for speed, and that it wouldn't
fit their model very well in the large, or wanting the ability to add
serializability as a safety guarantee without tearing down your whole
infrastructure.
Many DBs' default transaction mechanisms have semantics could lend
themselves to higher availability, but their implementation just doesn't
appear to support it.
More information about the erlang-questions
mailing list