mnesia: inserting a large number of records
Hakan Mattsson
hakan@REDACTED
Mon Aug 1 15:50:48 CEST 2005
Sebastian,
try mnesia:write_lock_table/1 first. It is simple
to use and reduces the locking overhead substantially.
I am attaching a simple test program, where the difference
turned out to be almost a factor 50 with 10000 records a
~4000 bytes. But you should really measure on your own
hardware with real data.
Vance, did you try table locks? How much did you gain
with your solution?
/Håkan
On Thu, 28 Jul 2005, Sebastian Bello wrote:
SB> Date: Thu, 28 Jul 2005 17:51:29 -0300
SB> From: Sebastian Bello <sebastian@REDACTED>
SB> To: erlang-questions@REDACTED
SB> Subject: Re: mnesia: inserting a large number of records
SB>
SB> Vance,
SB>
SB> thank you very much for your response and your detailed description. The
SB> solution looks a bit complicated to me, but I'll try some tests.
SB> Thanks!
SB> Sebastian-
SB>
SB> ----- Original Message -----
SB> From: "Vance Shipley" <vances@REDACTED>
SB> To: "Sebastian Bello" <sebastian@REDACTED>
SB> Cc: <erlang-questions@REDACTED>
SB> Sent: Wednesday, July 27, 2005 5:24 AM
SB> Subject: Re: mnesia: inserting a large number of records
SB>
SB>
SB> > On Fri, Jul 22, 2005 at 10:29:36AM -0300, Sebastian Bello wrote:
SB> > }
SB> > } a programm reads records from a text file and inserts them in
SB> > } a mnesia table. We are performing this insertions within a
SB> > } transaction so in case of an error the whole file can be
SB> > } reprocessed. The file holds approx. 5.000-10.000 records.
SB> > } It seems the transaction time is not linear; I'm wondering if
SB> > } there is a faster way to perform the insertions, maybe using
SB> > } a table lock, I don't know. Any suggestions?
SB> >
SB> > Sebastian,
SB> >
SB> > I had a similiar challenge where we wanted to import large text
SB> > files into a distributed mnesia database while it was in production.
SB> > In our case we mostly needed to replace the existing copy so I
SB> > came up with the following scheme:
SB> >
SB> > - create a new ram based table (e.g. foo_import)
SB> > - use a write lock transaction fun with mnesia:ets/1 to
SB> > insert records
SB> > - use mnesia:change_table_copy_type/3 to change it to a
SB> > disc based table on the local node only
SB> > - activate a check point on this table table
SB> > - backup this checkpoint using a custom mnesia_backup
SB> > behaviour callback module to change the records on
SB> > the fly to use the real table name (e.g. #foo_import{}
SB> > to #foo{}).
SB> >
SB> > The idea is that you create the table in an ets context without
SB> > lock overheads so that it is a fast operation (i.e. the user doesn't
SB> > wait long) and then write it out to a binary backup file on disk.
SB> >
SB> > Now the user may use mnesia:restore/2 to replace the working
SB> > table with the backup. You can do this while the system is running
SB> > and transactions will block while it replaces the table. In our
SB> > experience a couple seconds at worst. As I said we just replace
SB> > the current table but you could just as easily insert the records
SB> > into the existing table using the keep_tables option. I haven't
SB> > tried this scheme so I can't say how it performs. For our purposes
SB> > we changed the time it took to perform the import from many minutes,
SB> > if not hours, to maybe twenty seconds. Aftet that as I said the
SB> > table can be replaced in a couple seconds.
SB> >
SB> > -Vance
-------------- next part --------------
-module(bulk).
-compile(export_all).
-record(t, {key, val}).
go() ->
Tab = t,
Storage = disc_copies,
Nodes = [node() | nodes()],
N = 10000,
Bulk = term_to_binary(lists:seq(1, 1000)),
init(Tab, Storage, Nodes),
io:format("Import ~p records a ~p bytes into ~p ~p\n",
[N, size(Bulk), length(Nodes), Storage]),
TabTime = run(Tab, N, Bulk, table),
RecTime = run(Tab, N, Bulk, record),
io:format("Table lock: ~p seconds\n", [TabTime / 1000000]),
io:format("Record lock: ~p seconds\n", [RecTime / 1000000]),
io:format("~p times improvement\n", [RecTime / TabTime]).
run(Tab, N, Bulk, LockType) ->
Fun = fun() ->
case LockType of
table ->
mnesia:write_lock_table(Tab);
record ->
ignore
end,
import(N, Bulk),
ok
end,
{atomic,ok} = mnesia:clear_table(Tab),
{Time, {atomic,ok}} = timer:tc(mnesia, sync_transaction, [Fun]),
Time.
init(Tab, Storage, Nodes) ->
rpc:multicall(Nodes, mnesia, stop, []),
mnesia:delete_schema(Nodes),
ok = mnesia:create_schema(Nodes),
rpc:multicall(Nodes, mnesia, start, []),
TabDef = [{Storage, Nodes}, {record_name, Tab}],
{atomic,ok} = mnesia:create_table(Tab, TabDef).
import(0, _Bulk) ->
ok;
import(N, Bulk) ->
mnesia:write(t, #t{key = N, val = Bulk}, write),
import(N - 1, Bulk).
More information about the erlang-questions
mailing list