mnesia replication (Are there checksums?)

Thu Sep 1 08:36:53 CEST 2005

I am amazed we never came across this bug (ok, feature :-) ) before. I 
would have expected an alarm to be generated as soon as the databases 
became inconsistent. I guess a way to come around the problem is to hash 
the dirty writes across the nodes based on the key.

How hard would it be to add a checksum to each table? It should not 
generate any major overheads... The subject had been discussed, but 
probably before you took over the reins.

all my best,
Francesco
--
http://www.erlang-consulting.com

Hakan Mattsson wrote:
> No, there are no table checksums. Mnesia relies on
> other recovery mechanisms.
> 
> The behaviour that you call a serious bug, is
> deliberate. Normally all database accesses should be
> performed within transactions. If the performance is
> good enough you should not use dirty access at
> all. The only reason for using dirty access is to
> gain better performance. But that does not come for
> free, as you need to deal with almost all
> concurrency issues yourself. One of these issues is
> serialization of updates. If this is unexpected, the
> documentation should be blamed (or possibly the
> reader of the documentation ;-).
> 
> /Håkan
> 
> On Wed, 31 Aug 2005, Francesco Cesarini (Erlang Training & Consulting) wrote:
> 
> FC> I would class this as a serious bug! I have a vague recollection that
> FC> there was a checksum being computed for every table, but have looked
> FC> everywhere and can not find any reference for it. Maybe it was just a
> FC> discussion I had with some one 10 years ago, or something... Did it ever
> FC> happen?
> FC> 
> FC> Francesco
> FC> --
> FC> http://www.erlang-consulting.com
> FC> 
> FC> 
> FC> Dan Gudmundsson wrote:
> FC> > chandru writes:
> FC> >  > Hi,
> FC> >  >  > On 30/08/05, Serge Aleynikov <serge@REDACTED> wrote:
> FC> >  > > Hello,
> FC> >  > >  > > Could someone comment on the effect of short network outages
> FC> > ( < 10-15
> FC> >  > > s) on mnesia replication and how to prevent the inconsistency
> FC> >  > > demonstrated in the example below?   I intentionally did not alter
> FC> > the
> FC> >  > > net_ticktime kernel parameter so that it would be greater than the
> FC> >  > > duration of the brief network outage.
> FC> >  >  > You can't really prevent this inconsistency if you are using
> FC> > dirty
> FC> >  > operations. Have you tried the same test using transactions instead
> FC> > of
> FC> >  > dirty operations.
> FC> > 
> FC> > Since dirty_operation don't grab a lock you should be able see the same
> FC> > problem
> FC> > with a working network ..
> FC> > 
> FC> > Dirty is dirty, be aware of that.
> FC> > 
> FC> > /Dan