mnesia: inserting a large number of records

Hakan Mattsson hakan@REDACTED
Mon Aug 1 15:50:48 CEST 2005


Sebastian,
try mnesia:write_lock_table/1 first. It is simple
to use and reduces the locking overhead substantially.

I am attaching a simple test program, where the difference
turned out to be almost a factor 50 with 10000 records a
~4000 bytes. But you should really measure on your own
hardware with real data. 

Vance, did you try table locks? How much did you gain
with your solution?

/Håkan

On Thu, 28 Jul 2005, Sebastian Bello wrote:

SB> Date: Thu, 28 Jul 2005 17:51:29 -0300
SB> From: Sebastian Bello <sebastian@REDACTED>
SB> To: erlang-questions@REDACTED
SB> Subject: Re: mnesia: inserting a large number of records
SB> 
SB> Vance,
SB> 
SB> thank you very much for your response and your detailed description. The
SB> solution looks a bit complicated to me, but I'll try some tests.
SB> Thanks!
SB>  Sebastian-
SB> 
SB> ----- Original Message -----
SB> From: "Vance Shipley" <vances@REDACTED>
SB> To: "Sebastian Bello" <sebastian@REDACTED>
SB> Cc: <erlang-questions@REDACTED>
SB> Sent: Wednesday, July 27, 2005 5:24 AM
SB> Subject: Re: mnesia: inserting a large number of records
SB> 
SB> 
SB> > On Fri, Jul 22, 2005 at 10:29:36AM -0300, Sebastian Bello wrote:
SB> > }
SB> > } a programm reads records from a text file and inserts them in
SB> > } a mnesia table. We are performing this insertions within a
SB> > } transaction so in case of an error the whole file can be
SB> > } reprocessed. The file holds approx. 5.000-10.000 records.
SB> > } It seems the transaction time is not linear; I'm wondering if
SB> > } there is a faster way to perform the insertions, maybe using
SB> > } a table lock, I don't know. Any suggestions?
SB> >
SB> > Sebastian,
SB> >
SB> > I had a similiar challenge where we wanted to import large text
SB> > files into a distributed mnesia database while it was in production.
SB> > In our case we mostly needed to replace the existing copy so I
SB> > came up with the following scheme:
SB> >
SB> >    - create a new ram based table (e.g. foo_import)
SB> >    - use a write lock transaction fun with mnesia:ets/1 to
SB> >      insert records
SB> >    - use mnesia:change_table_copy_type/3 to change it to a
SB> >      disc based table on the local node only
SB> >    - activate a check point on this table table
SB> >    - backup this checkpoint using a custom mnesia_backup
SB> >      behaviour callback module to change the records on
SB> >      the fly to use the real table name (e.g. #foo_import{}
SB> >      to #foo{}).
SB> >
SB> > The idea is that you create the table in an ets context without
SB> > lock overheads so that it is a fast operation (i.e. the user doesn't
SB> > wait long) and then write it out to a binary backup file on disk.
SB> >
SB> > Now the user may use mnesia:restore/2 to replace the working
SB> > table with the backup.  You can do this while the system is running
SB> > and transactions will block while it replaces the table.  In our
SB> > experience a couple seconds at worst.  As I said we just replace
SB> > the current table but you could just as easily insert the records
SB> > into the existing table using the keep_tables option.  I haven't
SB> > tried this scheme so I can't say how it performs.  For our purposes
SB> > we changed the time it took to perform the import from many minutes,
SB> > if not hours, to maybe twenty seconds.  Aftet that as I said the
SB> > table can be replaced in a couple seconds.
SB> >
SB> > -Vance
-------------- next part --------------
-module(bulk).

-compile(export_all).

-record(t, {key, val}).

go() ->
    Tab = t,
    Storage = disc_copies,
    Nodes = [node() | nodes()],
    N = 10000,
    Bulk = term_to_binary(lists:seq(1, 1000)),
    init(Tab, Storage, Nodes),
    io:format("Import ~p records a ~p bytes into ~p ~p\n",
	      [N, size(Bulk), length(Nodes), Storage]),
    TabTime = run(Tab, N, Bulk, table),
    RecTime = run(Tab, N, Bulk, record),
    io:format("Table lock: ~p seconds\n", [TabTime / 1000000]),
    io:format("Record lock: ~p seconds\n", [RecTime / 1000000]),
    io:format("~p times improvement\n", [RecTime / TabTime]).
    
run(Tab, N, Bulk, LockType) ->
    Fun = fun() ->
		  case LockType of
		      table ->
			  mnesia:write_lock_table(Tab);
		      record ->
			  ignore
		  end,
		  import(N, Bulk),
		  ok
	  end,
    {atomic,ok} = mnesia:clear_table(Tab),
    {Time, {atomic,ok}} = timer:tc(mnesia, sync_transaction, [Fun]),
    Time.
      
init(Tab, Storage, Nodes) ->
    rpc:multicall(Nodes, mnesia, stop, []),
    mnesia:delete_schema(Nodes),
    ok = mnesia:create_schema(Nodes),
    rpc:multicall(Nodes, mnesia, start, []),
    TabDef = [{Storage, Nodes}, {record_name, Tab}],
    {atomic,ok} = mnesia:create_table(Tab, TabDef).

import(0, _Bulk) ->
    ok;
import(N, Bulk) ->
    mnesia:write(t, #t{key = N, val = Bulk}, write),
    import(N - 1, Bulk).

    
    
    


More information about the erlang-questions mailing list