[erlang-questions] Index Overhead In Mnesia
Ben Hood
0x6e6562@REDACTED
Tue Jun 10 13:51:16 CEST 2008
Hi,
I'm looking into the rate of inserting rows in mnesia.
Having written the attached test (that can be parameterized to insert
an arbitrary amount of rows in arbitrary chunk sizes), I've found out
so far that the highest throughput seems to be somebody where between
50 and 200 per transaction.
What surprised me a bit is the magnitude of the effect that index
maintenance has on the rate of insertion.
If I place secondary indexes on two non-key attributes, the throughput
drops off considerably.
For example, inserting 10000 rows in batches of 1000 whilst
maintaining 2 non-key indexes produces the following rates of
insertion per batch:
rate:insert(10000,1000).
Batch rate = 10688
Batch rate = 7182
Batch rate = 5001
Batch rate = 4072
Batch rate = 3300
Batch rate = 2866
Batch rate = 2377
Batch rate = 2166
Batch rate = 1807
Batch rate = 1303
The Batch rate is the amount of inserts per second in each batch.
This tallies up with the idea that at the beginning the index overhead
is tiny, but grows on each insertion, which is normal.
I just didn't think that the throughput would drop off so sharply.
Does anybody know if I'm doing something completely wrong or if there
is a much better way to use mnesia with large tables?
Thanks,
Ben
-module(rate).
-compile(export_all).
-record(a, {id,first,second}).
init() ->
mnesia:create_schema([node()]),
mnesia:start(),
mnesia:delete_table(a),
mnesia:create_table(a,
[{attributes, record_info(fields, a)}]),
mnesia:add_table_index(a,first),
mnesia:add_table_index(a,second),
ok.
insert(N,BatchSize) ->
mnesia:clear_table(a),
batch(N, BatchSize).
batch(0,_) -> ok;
batch(N,BS) ->
F = fun() -> write(#a{first = BS,second = BS},BS) end,
{Time,_} = timer:tc(mnesia,transaction,[F]),
io:format("Batch rate = ~p~n",[round(BS / Time * 1000000)]),
batch(N - BS, BS).
write(_,0) -> ok;
write(X,N) ->
mnesia:write(X#a{id = now()}),
write(X,N-1).
More information about the erlang-questions
mailing list