[erlang-bugs] mnesia can corrupt tables if the VM runs out of file descriptors
Mikael Pettersson
mikpelinux@REDACTED
Sat May 30 18:33:01 CEST 2015
An embedded and charset-unspecified text was scrubbed...
Name: mnesia_corrupts_data.erl
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20150530/1654b5ae/attachment.ksh>
-------------- next part --------------
If the Erlang VM is close to its file descriptor limit, and mnesia tries
to open a disc_copies table's .DCL file, the open fails with {error, emfile}
which mnesia_log:open_log/6 incorrectly interprets as a corrupt file, and it
then DELETES the perfectly valid .DCL file.
This is reproducible with (at least) OTP 18.0-rc2 and R15B03.
(This is not a hypothetical problem, it hit us and corrupted three tables,
though we managed to recover through luck and manual emergency procedures.)
I'm attaching a standalone module which reproduces the corruption for me
with OTP 18.0-rc2 on a Fedora 20 Linux / x86_64 desktop system. Here's a
transcript from a run of that module (foo.DCL is the interesting file):
Script started on Sat 30 May 2015 05:47:41 PM CEST
_1_/tmp/otp/bin/erlc mnesia_corrupts_data.erl
_2_/tmp/otp/bin/erl
Erlang/OTP 18 [RELEASE CANDIDATE 2] [erts-7.0] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V7.0 (abort with ^G)
1> mnesia_corrupts_data:doit().
DCL created after 1024 records written
=INFO REPORT==== 30-May-2015::17:48:23 ===
application: mnesia
exited: stopped
type: temporary
Initial DB:
total 152
-rw-r--r-- 1 mikpe mikpe 2715 May 30 17:48 LATEST.LOG
-rw-r--r-- 1 mikpe mikpe 104101 May 30 17:48 PREVIOUS.LOG
-rw-r--r-- 1 mikpe mikpe 8 May 30 17:48 foo.DCD
-rw-r--r-- 1 mikpe mikpe 31039 May 30 17:48 foo.DCL
-rw-r--r-- 1 mikpe mikpe 6750 May 30 17:48 schema.DAT
managed to open 1014 files
Mnesia(nonode@REDACTED): Data may be missing, Corrupt logfile deleted: "/tmp/Mnesia.nonode@REDACTED/foo.DCL", {file_error,
"/tmp/Mnesia.nonode@REDACTED/foo.DCL",
emfile}
=ERROR REPORT==== 30-May-2015::17:48:25 ===
Mnesia(nonode@REDACTED): ** ERROR ** (could not write core file: emfile)
** FATAL ** Cannot open log file "/tmp/Mnesia.nonode@REDACTED/foo.DCL": {file_error,
"/tmp/Mnesia.nonode@REDACTED/foo.DCL",
emfile}
=ERROR REPORT==== 30-May-2015::17:48:35 ===
** Generic server mnesia_monitor terminating
** Last message in was {'EXIT',<0.95.0>,killed}
** When Server state == {state,<0.95.0>,[],[],true,[],undefined,[],[]}
** Reason for termination ==
** killed
=ERROR REPORT==== 30-May-2015::17:48:35 ===
** Generic server mnesia_recover terminating
** Last message in was {'EXIT',<0.95.0>,killed}
** When Server state == {state,<0.95.0>,undefined,undefined,undefined,0,false,
true,[]}
** Reason for termination ==
** killed
=INFO REPORT==== 30-May-2015::17:48:35 ===
application: mnesia
exited: {killed,{mnesia_sup,start,[normal,[]]}}
type: temporary
** exception exit: {badmatch,{error,{killed,{mnesia_sup,start,[normal,[]]}}}}
in function mnesia_corrupts_data:check_db/1 (mnesia_corrupts_data.erl, line 39)
2>
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution
^C
_3_ls -l Mnesia.nonode@REDACTED/
total 120
-rw-r--r-- 1 mikpe mikpe 2715 May 30 17:48 LATEST.LOG
-rw-r--r-- 1 mikpe mikpe 104101 May 30 17:48 PREVIOUS.LOG
-rw-r--r-- 1 mikpe mikpe 8 May 30 17:48 foo.DCD
-rw-r--r-- 1 mikpe mikpe 6750 May 30 17:48 schema.DAT
_4_
/Mikael
More information about the erlang-bugs
mailing list