Emulator stopping during mnesia writes

Wed May 10 17:46:56 CEST 2000

>>>>> "sh" == Sean Hinde <Sean.Hinde@REDACTED> writes:

sh> Working under my commercial support agreement with the guys in
sh> Sweden we have established that as the emulator is single threaded
sh> with regard to disk operations, a disk operation which is slow
sh> will stop the emulator.

The output of "sar" or "iostat" would help confirm this.  If say a 1
minute snapshot of activity from "sar -d 5 12" shows that the
disk/volume that the Mnesia log is written to is busier than 70%
("%busy"), the average I/O queue length is above a handful
("avqueue"), and/or the average seek time is above 20 ms ("avseek"),
you've got a disk/volume that's too busy.  (I'm pulling those numbers
out of my memory ... I highly recommend Adrian Cockroft's book on
Solaris performance tuning book (2nd edition), if you don't already
have it.)

Over-busy disk spindles are the bane of Usenet News, email, and
database servers universally.  In my INN hacking days, I've seen
hopelessly overloaded INN servers (using "truss") take over a second
just to perform a single open("/var/spool/news/alt/whatever/989832",
O_CREAT|O_EXCL|O_WRONLY) system call.  Problems with too many files in
a single directory on an FFS-based file system (as Sun's UFS is) were
only exacerbated by having the disks in the file system (using the
Solstice volume manager) at over 95% busy on average.

If indeed the disk/disks storing the file system with the mnesia logs
is too busy, the solutions are few: software-based striping across
multiple disk drives, hardware-based striping across multiple disk
drives, solid state disk drives, or algorithmic changes to reduce I/O
workload.  The last may be more difficult to do than the former
three.  :-)

-Scott