[erlang-questions] Mnesia could not write core file: system_limit

Wed Dec 9 09:24:17 CET 2009

That is a possibility, node has been running for about 40 days, but i've had longer runs with no problem on the same system with similar loads. 

How would I check the FD usage of a running system? Is there a way to get notified if node is approaching the limit, in which case I could take some corrective action (such as restarting the node).

Do open FDs get closed if a process dies? Perhaps that's where the leak is coming from.

Biggest question that remains is why did mnesia lose data. I would understand losing whatever was going to be written when system_limit was reached, but I lost a lot of "old" data as well, data that was there for days, even weeks

Thanks
Slobo

On 2009-12-08, at 11:58 PM, Valentin Micic wrote:

> Just a wild guess... could it be that you're running out of available file
> descriptors? The error indicates that you cannot open a log file, which may
> be caused by this.
> 
> V/
> 
> -----Original Message-----
> From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On
> Behalf Of Slobodan Miskovic
> Sent: 09 December 2009 02:54 AM
> To: Erlang-Questions Questions
> Subject: [erlang-questions] Mnesia could not write core file: system_limit
> 
> Hi all, one of our nodes just popped up the following error:
> 
> Mnesia(node@REDACTED): ** ERROR ** (could not write core file:
> system_limit)
> ** FATAL ** Cannot open log file "/var/mnesia/invoice_item.DCL":
> {file_error, "/var/mnesia/invoice_item.DCL", system_limit}
> 
> I naively tried mnesia:start() only to have the same error repeat but
> for a different table.
> 
> Restarting the whole node it now seems ok (error hasn't popped up again)
> but...
> 
> Problem is that it seems data is now missing from at least one of those
> tables. I have made a backup of the whole mnesia after the second
> failure, but I don't suppose that data in there somewhere.
> 
> 1. What has caused this? There is plenty of space left on the device,
> process was running as root, ulimit reports unlimited.
> 
> 2. Is the data really gone? I have enough information elsewhere to
> reconstruct the data, but would like to know where it has gone.
> 
> 3. How to prevent this from happening again? Periodic backups would not
> help with this as I can not afford to loose data since last backup even
> if I was to do 1hr intervals. Would running another node on another
> system be assurance enough? 
> 
> I find it very strange that some data would get dropped (about 10k
> records out of 85k record table), is it a sign of a mnesia bug or is
> this something I should anticipate and work around?
> 
> Some potentialy usefull System Info:
> - Erlang R12B4 
> - about 200 tables, 50% sets 50% bags, both ram and disc copies for each
> table
> - node takes about 1.2GB ram when data is loaded (I understand there is
> 2GB per table limit, or am I misguided)
> - du -sh /var/mnesia/
> 343M    /var/mnesia/
> 
> - Filesystem            Size  Used Avail Use% Mounted on
> /dev/md/1             233G   49G  184G  22% /
> 
> - Slackware Linux 2.6.24.5-smp #2 SMP Wed Apr 30 13:41:38 CDT 2008 i686
> Intel(R) Pentium(R) Dual  CPU  E2200  @ 2.20GHz GenuineIntel GNU/Linux
> 
> - node did not generate erl_crash.dump as I brought it down via shell
> q(). The only trace I have is the error above and subsequent error
> reports of calls to mnesia failing {aborted, {node_not_running...
> 
> 
> Any help and pointers are highly appreciated.
> 
> Thanks!
> Slobo