[erlang-questions] Index Overhead In Mnesia

Ben Hood <>
Tue Jun 10 13:51:16 CEST 2008


I'm looking into the rate of inserting rows in mnesia.

Having written the attached test (that can be parameterized to insert  
an arbitrary amount of rows in arbitrary chunk sizes), I've found out  
so far that the highest throughput seems to be somebody where between  
50 and 200 per transaction.

What surprised me a bit is the magnitude of the effect that index  
maintenance has on the rate of insertion.

If I place secondary indexes on two non-key attributes, the throughput  
drops off considerably.

For example, inserting 10000 rows in batches of 1000 whilst  
maintaining 2 non-key indexes produces the following rates of  
insertion per batch:

Batch rate = 10688
Batch rate = 7182
Batch rate = 5001
Batch rate = 4072
Batch rate = 3300
Batch rate = 2866
Batch rate = 2377
Batch rate = 2166
Batch rate = 1807
Batch rate = 1303

The Batch rate is the amount of inserts per second in each batch.

This tallies up with the idea that at the beginning the index overhead  
is tiny, but grows on each insertion, which is normal.

I just didn't think that the throughput would drop off so sharply.

Does anybody know if I'm doing something completely wrong or if there  
is a much better way to use mnesia with large tables?





-record(a, {id,first,second}).

init() ->
                         [{attributes, record_info(fields, a)}]),

insert(N,BatchSize) ->
     batch(N, BatchSize).

batch(0,_) -> ok;
batch(N,BS) ->
     F = fun() -> write(#a{first = BS,second = BS},BS) end,
     {Time,_} = timer:tc(mnesia,transaction,[F]),
     io:format("Batch rate = ~p~n",[round(BS / Time * 1000000)]),
     batch(N - BS, BS).

write(_,0) -> ok;
write(X,N) ->
     mnesia:write(X#a{id = now()}),

More information about the erlang-questions mailing list