[erlang-questions] auto-syncing mnesia after a network split

Tue Dec 2 21:45:18 CET 2008

On Tuesday, December 02, 2008, Rick Pettit wrote:

> The ideas are interesting, though I pray my bank never adopts such
> software.
> 
> I don't think bank software can continue to allow transactions like
> deposit and withdrawal during a network partition--I just don't see how
> that can be made to work while maintaining a consistent view of the
> various accounts across disconnected nodes (e.g. how can a bank ATM allow
> me to withdraw funds if it cannot reach its peer node(s) at my bank to
> determine the availability of such funds?).

I was thinking the opposite.  I would presume banks would have to continue
to function through network problems, so these must be solved problems,
backed by good solid theory.

(Now, I doubt banks just record account balances, since they also need
transaction details, so the master balance of an account is not the
'account_balances' table but the sum of all transactions pertaining to that
account.  (In practice, I imagine they reconcile the transactions into a
balance summary periodically.)  Therefore, all transactions are inserts, and
balances are updated asynchronously.)

> With issues
> like telecom "glare" I couldn't be 100% accurate all the time anyway.

What is telecom "glare"?

Cheers,

David

> -----Original Message-----
> From: erlang-questions-bounces@REDACTED [mailto:erlang-questions-
> bounces@REDACTED] On Behalf Of 
> Sent: 14:30
> To: Joel Reymont
> Cc: Erlang Questions
> Subject: Re: [erlang-questions] auto-syncing mnesia after a network split
> 
> On Tue, December 2, 2008 2:03 pm, Joel Reymont wrote:
> > Alex,
> >
> > On Dec 2, 2008, at 5:18 PM, Alex wrote:
> >
> >> what happens when you have multiple updates to both sides of the
> >> split?  if you just pick the highest vnum, you lose all the
> >> transactions from the other side of the split when it rejoins.
> >
> >
> > You can pick up new (inserted) records by doing a diff of primary keys
> > for each table.
> >
> > You cannot do anything about deleted records, I think, so you'll just
> > have to delete those again somehow. You could assume that the table
> > replica with the latest timestamp is the right one and just delete the
> > extra records from the other table.
> >
> > Imagine a bank account that's distributed across the split nodes,
> > where a customer deposits money a 2 times and the deposits are split
> > across the nodes. You'll pick up the latest deposit on one node and
> > miss the other deposit.
> >
> > I think you can overcome this programmatically, with a timestamp _and_
> > a version number. You can have a version table per node with three
> > columns: table name, vnum and timestamp. The rest of the tables would
> > have just the vnum in their records.
> >
> > When updating table T, you will first update the version table by
> > storing the current time and bumping the vnum for the key T. You will
> > then store the vnum in the record of table T that you are updating.
> >
> > You will be able to find the split time by looking at the version
> > tables and figuring out when the vnums started to diverge. You can
> > then invoke a merge function that figures out, for example, how to
> > merge a bunch of bank deposit transactions into a single balance.
> >
> > You will know the vnum at split time and will only need to consider
> > the transactions that happened after. Shouldn't be a lot of
> > transactions for a short split time.
> >
> > What do you think?
> 
> The ideas are interesting, though I pray my bank never adopts such
> software.
> 
> I don't think bank software can continue to allow transactions like
> deposit and withdrawal during a network partition--I just don't see how
> that can be made to work while maintaining a consistent view of the
> various accounts across disconnected nodes (e.g. how can a bank ATM allow
> me to withdraw funds if it cannot reach its peer node(s) at my bank to
> determine the availability of such funds?).
> 
> Most systems I work with implement a recovery procedure similar to what
> Ulf has posted in the past on this list. This works in my _special case_
> because I am tracking real-time telephony stats used to route calls (vs.
> manage bank account information).
> 
> Because the systems I am referring to require high-availability over 100%
> data consistency, this is perfectly ok (and works quite well). With issues
> like telecom "glare" I couldn't be 100% accurate all the time anyway.
> 
> So, to recover from a partition it is enough to pick any functioning node
> as the new "master" and have others restart and/or force load tables from
> it. The entire time clients keep pushing new stats into the system, so
> everything "converges on reality" in the end following a recovery attempt
> anyway.
> 
> This system works extremely well, but again I wouldn't dream of using it
> to implement ATM software for managing bank accounts.
> 
> -Rick
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions