[erlang-bugs] mnesia can corrupt tables if the VM runs out of file descriptors

Mikael Pettersson mikpelinux@REDACTED
Sat May 30 18:33:01 CEST 2015


An embedded and charset-unspecified text was scrubbed...
Name: mnesia_corrupts_data.erl
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20150530/1654b5ae/attachment.ksh>
-------------- next part --------------
If the Erlang VM is close to its file descriptor limit, and mnesia tries
to open a disc_copies table's .DCL file, the open fails with {error, emfile}
which mnesia_log:open_log/6 incorrectly interprets as a corrupt file, and it
then DELETES the perfectly valid .DCL file.

This is reproducible with (at least) OTP 18.0-rc2 and R15B03.

(This is not a hypothetical problem, it hit us and corrupted three tables,
though we managed to recover through luck and manual emergency procedures.)

I'm attaching a standalone module which reproduces the corruption for me
with OTP 18.0-rc2 on a Fedora 20 Linux / x86_64 desktop system.  Here's a
transcript from a run of that module (foo.DCL is the interesting file):

Script started on Sat 30 May 2015 05:47:41 PM CEST
_1_/tmp/otp/bin/erlc mnesia_corrupts_data.erl
_2_/tmp/otp/bin/erl
Erlang/OTP 18 [RELEASE CANDIDATE 2] [erts-7.0] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.0  (abort with ^G)
1> mnesia_corrupts_data:doit().
DCL created after 1024 records written

=INFO REPORT==== 30-May-2015::17:48:23 ===
    application: mnesia
    exited: stopped
    type: temporary
Initial DB:
total 152
-rw-r--r-- 1 mikpe mikpe   2715 May 30 17:48 LATEST.LOG
-rw-r--r-- 1 mikpe mikpe 104101 May 30 17:48 PREVIOUS.LOG
-rw-r--r-- 1 mikpe mikpe      8 May 30 17:48 foo.DCD
-rw-r--r-- 1 mikpe mikpe  31039 May 30 17:48 foo.DCL
-rw-r--r-- 1 mikpe mikpe   6750 May 30 17:48 schema.DAT
managed to open 1014 files
Mnesia(nonode@REDACTED): Data may be missing, Corrupt logfile deleted: "/tmp/Mnesia.nonode@REDACTED/foo.DCL", {file_error,
                                                                                                           "/tmp/Mnesia.nonode@REDACTED/foo.DCL",
                                                                                                           emfile} 

=ERROR REPORT==== 30-May-2015::17:48:25 ===
Mnesia(nonode@REDACTED): ** ERROR ** (could not write core file: emfile)
 ** FATAL ** Cannot open log file "/tmp/Mnesia.nonode@REDACTED/foo.DCL": {file_error,
                                                                        "/tmp/Mnesia.nonode@REDACTED/foo.DCL",
                                                                        emfile}

=ERROR REPORT==== 30-May-2015::17:48:35 ===
** Generic server mnesia_monitor terminating 
** Last message in was {'EXIT',<0.95.0>,killed}
** When Server state == {state,<0.95.0>,[],[],true,[],undefined,[],[]}
** Reason for termination == 
** killed

=ERROR REPORT==== 30-May-2015::17:48:35 ===
** Generic server mnesia_recover terminating 
** Last message in was {'EXIT',<0.95.0>,killed}
** When Server state == {state,<0.95.0>,undefined,undefined,undefined,0,false,
                               true,[]}
** Reason for termination == 
** killed

=INFO REPORT==== 30-May-2015::17:48:35 ===
    application: mnesia
    exited: {killed,{mnesia_sup,start,[normal,[]]}}
    type: temporary
** exception exit: {badmatch,{error,{killed,{mnesia_sup,start,[normal,[]]}}}}
     in function  mnesia_corrupts_data:check_db/1 (mnesia_corrupts_data.erl, line 39)
2> 
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
^C
_3_ls -l Mnesia.nonode@REDACTED/
total 120
-rw-r--r-- 1 mikpe mikpe   2715 May 30 17:48 LATEST.LOG
-rw-r--r-- 1 mikpe mikpe 104101 May 30 17:48 PREVIOUS.LOG
-rw-r--r-- 1 mikpe mikpe      8 May 30 17:48 foo.DCD
-rw-r--r-- 1 mikpe mikpe   6750 May 30 17:48 schema.DAT
_4_

/Mikael



More information about the erlang-bugs mailing list