Mnesia dB size limits

Thu Aug 31 08:33:33 CEST 2006

Hi Valentin & Damir,

Maybe the problem is that it is not actually clear what the answers  
are. Even reading each one of the questions and all the answers over  
the last four years I still don't know the exact answer myself.

I can confirm that large tables (disc_copies) in a node that crashed  
can take a *very* long time to rebuild even if there are no DETS  
issues, especially if there are additional index fields on the field.  
(Mnesia creates a new ETS table for each index and needs to get the  
data into them). Ulf and I have fixed this in RDBMS by adding indexes  
that are now first class tables in mnesia. So be careful with  
indexes. (BTW if you get an MS SQL Server with a lot of data into  
"recovery" it will also take a very long time to recover - sometimes  
hours at a time)

I've seen Mnesia crash a couple of times with funnies like this:  
(Usually on 1Gb or larger tables)

=ERROR REPORT==== 29-Dec-2005::16:12:01 ===
Mnesia(tutuka@REDACTED): ** ERROR ** (core dumped to file: "d:/sys/ 
MnesiaCore.zuma@REDACTED")
  ** FATAL ** {error,{"Cannot open dets table",
                      isopacketlog,
                      [{file,"d:/sys/db/packetstorage.DAT"},
                       {keypos,2},
                       {repair,true},
                       {type,set}],
                      {bad_freelists,"d:/sys/db/packetstorage.DAT"}}}

We also have made it a "rule" in our design team never to use Mnesia  
for archiving data (i.e. as a warehouse) for the reason that by  
default mnesia keeps all the data in a table in *RAM* unless you use  
disc_only_copies and there is thus an upper limit to the size of the  
table. To solve this, we usually feed archive data (i.e. event  
records, logs etc) into an SQL database via some mechanism.

It does become interesting where you need the data "online" all the  
time, i.e. requiring random access read/write to records. This  
problem you solve by partitioning via whatever mechanism you choose.  
Mnesia is brilliant for this type of system as reads have virtually  
no cost. In the old days (other languages) I used to be very careful  
in doing any SQL access, but I've learned that you can disregard that  
concern with Mnesia altogether.

Valentin - maybe you can share some of your experience with large  
Mnesia databases with us? What are the things we need to be careful of?

Br,

Rudolph

Pattern Matched Technologies
www.patternmatched.com

On 30 Aug 2006, at 11:34 PM, Valentin Micic wrote:

> How many times are the same people going to give the same  
> "answers". This is getting absurd.
>
> V.
>
> ----- Original Message ----- From: "Yariv Sadan" <yarivvv@REDACTED>
> To: "Joel Reymont" <joelr1@REDACTED>
> Cc: "Philip Robinson" <philar@REDACTED>; <damir@REDACTED>;  
> <erlang-questions@REDACTED>
> Sent: Wednesday, August 30, 2006 7:41 PM
> Subject: Re: Mnesia dB size limits
>
>
>> AFAIK the problems with Mnesia isn't that you can't insert a lot of
>> data into it, but the following:
>>
>> 1) in the event of a crash, a very large dets table takes a long  
>> time to repair.
>> 2) there are no join optimizations (yet), so some queries can take a
>> long time to process
>> 3) as the dets freelist gets fragmented it consumes more and more  
>> memory.
>>
>> None of these items means you can't put a lot of data in Mnesia --
>> they just mean that you may run into problems if you do.
>>
>> With ets tables, issues 1) and 3) go away.
>>
>> Yariv
>>
>> On 8/30/06, Joel Reymont <joelr1@REDACTED> wrote:
>>> I have a suspicion that when people mention that Mnesia is great  
>>> with
>>> huge data sets they unintentionally forget to mention important
>>> details. My understanding is that call record databases are insert/
>>> retrieve only or for the most part.
>>>
>>> When people outside of the telco world ask about Mnesia they usually
>>> have MySQL, etc. in mind and wonder about insert/delete/update
>>> performance. My understanding at the moment is that Mnesia is not  
>>> the
>>> best database for insert/update/delete, much less with huge  
>>> databases.
>>>
>>> Please correct me if I'm wrong!
>>>
>>> On Aug 30, 2006, at 4:10 PM, Philip Robinson wrote:
>>>
>>> > I was once writing a program that needed to retrieve events
>>> > within a date/time range from an mnesia table, and found that it
>>> > did not seem to be hitting the index.  It would scan the entire
>>> > 1-million-plus records every query... dead slow.
>>> >
>>> > When I wanted to retrieve a specific event there was no noticable
>>> > delay, but most of my queries were for a date/time range.
>>> >
>>> > I think the mnesia issues being mentioned on this list were to do
>>> > with database recovery across nodes after a node failure...?
>>>
>>> --
>>> http://wagerlabs.com/
>>>
>>>
>>>
>>>
>>>
>
>