[erlang-questions] Efficient Insertions in Mnesia tables

Evans, Matthew mevans@REDACTED
Mon Nov 8 16:15:49 CET 2010


I'm wondering if you could edit the mnesia library, find out where the DETS file is created and add the option:

{min_no_slots,SomeLargeNumber}

http://www.erlang.org/doc/man/dets.html#open_file-2


Matt

-----Original Message-----
From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On Behalf Of Rudolph van Graan
Sent: Sunday, November 07, 2010 6:27 AM
To: Olivier BOUDEVILLE
Cc: erlang-questions@REDACTED
Subject: Re: [erlang-questions] Efficient Insertions in Mnesia tables

Hi,

This sounds like a disk subsystem issue. DETS (disc_only mnesia tables) uses buckets to store objects and will allocate (and reallocate) objects within buckets as you add more objects to it. If a bucket does not have space for a new object, the bucket must be split. This means the DETS file grows and some of the data is moved. Depending on your operating system, file system, record size on the file system, this will result in a lot of IO. In my opinion, what you see is to be expected - DETS selects a bucket based on the object's key's MD5 hash, so a specific insert can hit any bucket essentially at random. DETS is not a good choice if you want to constantly append to a table, but it works reasonably well if you have a finite set of keys.

Rudolph van Graan
www.patternmatched.com


On Nov 5, 2010, at 10:15 AM, Olivier BOUDEVILLE wrote:

> Hi,
> 
> We are trying to write (with mnesia:dirty_write) in a disc_only_copies 
> Mnesia table (type: set, not fragmented, not replicated) records (ex: 60 
> 000 of them) and we observe that the insertion time is increasing as the 
> table is increasingly crowded. This is not really a surprise but something 
> we need to avoid. What we would like is to have constant (and preferably 
> low) insertion times, like we had when writing directly to a file.
> 
> We tried to get as close as possible with the following settings and use:
> 
>                        % We want tables to be dumped less frequently from 
> memory to disc,
>                        % in order to buffer writings (default value is 
> 4):
>                        ok = application:set_env( mnesia, dc_dump_limit, 1 
> ),
> 
>                        % Increases a lot (default value is 100) the 
> maximum number of
>                        % writes to the transaction log before a new dump 
> is performed:
>                        ok = application:set_env( mnesia, 
> dump_log_write_threshold, 50000 ),
> 
> Over time we see the CPU load decrease steadily, the computer seems to 
> spend most of its time fighting for locks.
> 
> We happen to be in a pretty favorable situation (only writes, no 
> concurrent access to a given table). We chose disc_only_copies as there 
> might be a large number of such tables and if they filled over time they 
> could exhaust the RAM.
> 
> Is there anything we missed that would allow us (roughly) constant 
> insertion times with Mnesia?
> 
> Thanks in advance for any hint,
> Best regards,
> 
> Olivier.
> ---------------------------
> Olivier Boudeville
> 
> EDF R&D : 1, avenue du Général de Gaulle, 92140 Clamart, France
> Département SINETICS, groupe ASICS (I2A), bureau B-226
> Office : +33 1 47 65 59 58 / Mobile : +33 6 16 83 37 22 / Fax : +33 1 47 
> 65 27 13
> 
> 
> 
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
> 
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
> 
> Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
> ____________________________________________________
> 
> This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
> 
> If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
> 
> E-mail communication cannot be guaranteed to be timely secure, error or virus-free.



More information about the erlang-questions mailing list