[erlang-questions] Mnesia transaction restarts

Ulf Wiger ulf@REDACTED
Mon Oct 27 10:43:08 CET 2014


On 27 Oct 2014, at 01:17, Bernard Duggan <bduggan@REDACTED> wrote:

> But since it *is* happening there must be some property of our system that I don't fully understand. Time to go digging again :)

If the writes are replicated, you might want to look for overload conditions on remote mnesia_tm processes (e.g. due to mnesia_overload (message_queue_len)).

Dan will have to correct me if I’m wrong here...

If a younger (WaitForTid < OurTid) transaction gets held up on commit, older transactions can get stuck in a restart condition. Since the mnesia_tm process relies on (unoptimizable) selective receive, it’s vulnerable to long message queues (which can happen especially if you have lots of replicated dirty writes).

Note that when comparing tids from different nodes, the words ‘younger’ and ‘older' doen’t necessarily correspond to a true global ordering. The comparison is essentially {Counter, Pid}, and the Counter is node-local, incremented for each transaction start/restart. The important thing for the deadlock prevention is that the comparison rules are globally consistent (that is, produce the same result regardless of *where* the comparison takes place).

The mnesia_locker process has no blocking conditions that I know of. A lock request for a replicated read or write will loop through the ‘where_to_read’/‘where_to_write’ list and ask each locker in turn. The list is sorted, but the local node (if present) is always first. Locks are released when instructed by the mnesia_tm process (on commit/abort).

If you have side-effects inside transactions (message send/receive), basically all bets are off, and really weird things can happen, not least transaction processes waiting endlessly for messages that were already sent+received, but then discarded due to transaction restart.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141027/7adefe4a/attachment.htm>


More information about the erlang-questions mailing list