Mnesia, disk logging, and synchronous disk logging

Scott Lystig Fritchie fritchie@REDACTED
Wed Jan 25 22:54:15 CET 2006


>>>>> "hm" == Hakan Mattsson <hakan@REDACTED> writes:

hm> In Mnesia the coordinator does always wait synchronously for 2PC
hm> (and 3PC) votes from all participants, regardless of the
hm> transaction being "synchronous" or not.

That makes sense ... the coordinator can do Very Bad Things if it
doesn't gather all votes.

hm> I agree that such a feature can be useful.  At least if the there
hm> are no write caches enabled in the disk hardware. Otherwise you
hm> could lose some data anyway in case of a power failure.

Even if your disk subsystem(*) has an NVRAM write-back cache, there is
risk of data loss unless you explicitly the fsync(2) system call.

With Mnesia using the disk_log module, which in turn usually uses
write(2) only, you are not certain that the OS will have copied
write(2)'s data to the disk device.  In most cases, the kernel can
(and will) wait for many seconds before flushing that data to the disk
device.

SLF> But I can't find a Mnesia transaction knob/button that I can
SLF> twist/press to request that level of safety.  Is there such a
SLF> thing?

hm> No currently there are no such thing in Mnesia.

That's what I'd thought.

Assuming that I wanted to try to add that to Mnesia ... I think I'd
need to add extra info to the commit record that's sent to each
participant.  Something that said: this log record is important enough
to use fsync after writing.  Hm.

I suppose a poor man's safety net would be to run a shell script like
this on each Mnesia node with disc_copies or disc_only_copies:

    while [ 1 ]; do
        sync
        sleep 1
    done

Easy to do, doesn't require code changes, and would limit worst-case
data loss to roughly 1-2 seconds.  (Assuming that disc_log and the
file Port that disc_log uses do not do any buffering.)  On the other
hand, performance may suck.

Too bad disk drives are so too darn slow.

-Scott

(*) Even if the disk logical device is a NVRAM/solid-state disk drive.



More information about the erlang-questions mailing list