mnesia and binary/large files

Thu Oct 14 10:17:19 CEST 1999

On Wed, 13 Oct 1999, Klacke wrote:

klacke> An obvious mnesia enhancemenet would be the following:
klacke> Two nodes A + B, one table T.
klacke> A dies, X and Y is inserted in T (on B only), A recovers.
klacke> Now instead of copying the entire table T, we only need to
klacke> redo the insertions of X and Y at A.
klacke> 
klacke> Once I started to implement this but got severe headaches. It's
klacke> a lot harder than it sounds.

Perhaps there are some volunteers, looking for headache? ;-)

There are several approches that could be explored in order to speed
up the table loading mechanism:

 -  one is to introduce the notion of version identifiers on records in
    ets and dets. This would enable the table load mechanism to rely on an
    algorithm that looks for records whose version identifiers differs
    between trusted nodes and startups. To go a step further we could use
    a common version identifier generator for all records in a table. This
    would enable us to use the version identifier as a kind of timestamp
    for the entire table. If both nodes agrees on the fact that the
    the table timestamp are the same on both nodes, there are no need
    to iterate over the table and compare the version identifier of each
    record.

 -  another is to introduce the notion of durable checkpoints in Mnesia.
    That is checkpoints that survives, even if all nodes would go down
    simultaneously. A durable checkpoint should be activated while all
    replicas are active and the checkpoint should then be moved forwards,
    now and then, in the same manner as checkpoints used for the
    purpose of incremental backups. At startup the table load algorithm
    could compare the checkpoint contents of the nodes and only transfer
    the lost updates. Using checkpoints are not for free, but in systems
    using incremental backups it would be almost for free.

 -  ...

The real headache comes, when you start to think about backwards
compatibility and smooth on-line upgrade...

klacke>Ps. Now don't go and get all worked up about this, do some
klacke>measurements instead and see how long it takes to repair
klacke>as well as network load tables of different sizes.
klacke>Usually it's ok.

If it is not ok, you should try to use the feature of fragmented
tables in Mnesia. This will make the dets files smaller and therefore
shorten the repair times of them. Fragmented tables may also be used
for the purpose of distributing large amounts of data over several
nodes, and therefore decrease need for copying bauta tables between
the nodes.

/Håkan