mnesia: inserting a large number of records

Vance Shipley vances@REDACTED
Wed Jul 27 10:24:03 CEST 2005


On Fri, Jul 22, 2005 at 10:29:36AM -0300, Sebastian Bello wrote:
}  
} a programm reads records from a text file and inserts them in
} a mnesia table. We are performing this insertions within a 
} transaction so in case of an error the whole file can be 
} reprocessed. The file holds approx. 5.000-10.000 records. 
} It seems the transaction time is not linear; I'm wondering if 
} there is a faster way to perform the insertions, maybe using 
} a table lock, I don't know. Any suggestions?

Sebastian,

I had a similiar challenge where we wanted to import large text 
files into a distributed mnesia database while it was in production.
In our case we mostly needed to replace the existing copy so I 
came up with the following scheme:

   - create a new ram based table (e.g. foo_import)
   - use a write lock transaction fun with mnesia:ets/1 to
     insert records
   - use mnesia:change_table_copy_type/3 to change it to a
     disc based table on the local node only
   - activate a check point on this table table
   - backup this checkpoint using a custom mnesia_backup
     behaviour callback module to change the records on
     the fly to use the real table name (e.g. #foo_import{}
     to #foo{}).

The idea is that you create the table in an ets context without
lock overheads so that it is a fast operation (i.e. the user doesn't
wait long) and then write it out to a binary backup file on disk.

Now the user may use mnesia:restore/2 to replace the working 
table with the backup.  You can do this while the system is running
and transactions will block while it replaces the table.  In our
experience a couple seconds at worst.  As I said we just replace
the current table but you could just as easily insert the records
into the existing table using the keep_tables option.  I haven't
tried this scheme so I can't say how it performs.  For our purposes
we changed the time it took to perform the import from many minutes,
if not hours, to maybe twenty seconds.  Aftet that as I said the
table can be replaced in a couple seconds.

	-Vance
  



More information about the erlang-questions mailing list