[erlang-questions] Efficient Insertions in Mnesia tables

Olivier BOUDEVILLE olivier.boudeville@REDACTED
Mon Nov 8 17:30:43 CET 2010


Hello Matt and Rudolph,

Thanks for your advice. Indeed a min_no_slot option could be neat for some 
kinds of use. 

I was hoping that I could use Mnesia to store my records very much like my 
basic file I/O implementation (just appending data to the 
disc_only_copies), but I had not anticipated that the insertion time could 
not be a 0(1) and could grow that much. As it is disc_only_copies, why 
would a large number of buckets be kept in RAM? Unless a default index is 
used for such a case? 

This is a bit of a surprise for me as I imagined that for example telco 
logs could be using Mnesia for similar cases: the use case of dumping a 
large number of entries to a file-based backend, with no caching in RAM 
lest it does not fit seemed quite common for me.

Anyway, as  Rudolph hinted, I guess I was trying to use the wrong tool for 
the task; I will stick to the writing to files. 

Thanks again,
Best regards,

Olivier Boudeville.
---------------------------
Olivier Boudeville

EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
Département SINETICS, groupe ASICS (I2A), bureau B-226
Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47 
65 27 13



mevans@REDACTED 
08/11/2010 16:16

A
rvg@REDACTED, olivier.boudeville@REDACTED
cc
erlang-questions@REDACTED
Objet
RE: [erlang-questions] Efficient Insertions in Mnesia tables







I'm wondering if you could edit the mnesia library, find out where the 
DETS file is created and add the option:

{min_no_slots,SomeLargeNumber}

http://www.erlang.org/doc/man/dets.html#open_file-2


Matt

-----Original Message-----
From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On 
Behalf Of Rudolph van Graan
Sent: Sunday, November 07, 2010 6:27 AM
To: Olivier BOUDEVILLE
Cc: erlang-questions@REDACTED
Subject: Re: [erlang-questions] Efficient Insertions in Mnesia tables

Hi,

This sounds like a disk subsystem issue. DETS (disc_only mnesia tables) 
uses buckets to store objects and will allocate (and reallocate) objects 
within buckets as you add more objects to it. If a bucket does not have 
space for a new object, the bucket must be split. This means the DETS file 
grows and some of the data is moved. Depending on your operating system, 
file system, record size on the file system, this will result in a lot of 
IO. In my opinion, what you see is to be expected - DETS selects a bucket 
based on the object's key's MD5 hash, so a specific insert can hit any 
bucket essentially at random. DETS is not a good choice if you want to 
constantly append to a table, but it works reasonably well if you have a 
finite set of keys.

Rudolph van Graan
www.patternmatched.com


On Nov 5, 2010, at 10:15 AM, Olivier BOUDEVILLE wrote:

> Hi,
> 
> We are trying to write (with mnesia:dirty_write) in a disc_only_copies 
> Mnesia table (type: set, not fragmented, not replicated) records (ex: 60 

> 000 of them) and we observe that the insertion time is increasing as the 

> table is increasingly crowded. This is not really a surprise but 
something 
> we need to avoid. What we would like is to have constant (and preferably 

> low) insertion times, like we had when writing directly to a file.
> 
> We tried to get as close as possible with the following settings and 
use:
> 
>                        % We want tables to be dumped less frequently 
from 
> memory to disc,
>                        % in order to buffer writings (default value is 
> 4):
>                        ok = application:set_env( mnesia, dc_dump_limit, 
1 
> ),
> 
>                        % Increases a lot (default value is 100) the 
> maximum number of
>                        % writes to the transaction log before a new dump 

> is performed:
>                        ok = application:set_env( mnesia, 
> dump_log_write_threshold, 50000 ),
> 
> Over time we see the CPU load decrease steadily, the computer seems to 
> spend most of its time fighting for locks.
> 
> We happen to be in a pretty favorable situation (only writes, no 
> concurrent access to a given table). We chose disc_only_copies as there 
> might be a large number of such tables and if they filled over time they 

> could exhaust the RAM.
> 
> Is there anything we missed that would allow us (roughly) constant 
> insertion times with Mnesia?
> 
> Thanks in advance for any hint,
> Best regards,
> 
> Olivier.
> ---------------------------
> Olivier Boudeville
> 
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> Département SINETICS, groupe ASICS (I2A), bureau B-226
> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47 

> 65 27 13
> 
> 
> 
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont 
établis à l'intention exclusive des destinataires et les informations qui 
y figurent sont strictement confidentielles. Toute utilisation de ce 
Message non conforme à sa destination, toute diffusion ou toute 
publication totale ou partielle, est interdite sauf autorisation expresse.
> 
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit 
de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou 
partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de 
votre système, ainsi que toutes ses copies, et de n'en garder aucune trace 
sur quelque support que ce soit. Nous vous remercions également d'en 
avertir immédiatement l'expéditeur par retour du message.
> 
> Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.
> ____________________________________________________
> 
> This message and any attachments (the 'Message') are intended solely for 
the addressees. The information contained in this Message is confidential. 
Any use of information contained in this Message not in accord with its 
purpose, any dissemination or disclosure, either whole or partial, is 
prohibited except formal approval.
> 
> If you are not the addressee, you may not copy, forward, disclose or use 
any part of it. If you have received this message in error, please delete 
it and all copies from your system and notify the sender immediately by 
return message.
> 
> E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.





Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________

This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or virus-free.


More information about the erlang-questions mailing list