[erlang-patches] Speed up index creation for Mnesia set tables

Nick Marino nmarino@REDACTED
Fri Apr 12 22:34:34 CEST 2013


Hello,

I've developed a small, simple patch against the mnesia_index code that
can greatly speed up index creation under certain circumstances.
Specifically, this improves index creation on columns with a lot of
duplicate values in Mnesia tables of type 'set'.

Because Mnesia currently uses ETS bag tables to store its indexes, the
insert performance drops drastically when you have lots of duplicate
values in an indexed column, since for bag tables ETS has to check to
make sure it's not inserting any duplicate elements. However, for Mnesia
set tables we can use duplicate_bag tables instead: it will never insert
any duplicate values into the index anyway, since every insert into an
index is preceded by an explicit call to del_ixes for any entries in the
index that it's replacing (for reference, take a look at the add_index2
function in the mnesia_index module).

For bag tables, or for cases without lots of duplicate values in an
indexed column, this change won't make any appreciable difference. But
in a real world test for an application that hits this particular
situation, I've seen Mnesia startup times jump from 30+ minutes to 15
seconds, so it can provide some major improvements in the right
scenario. The one downside I've found is that deletion of indexes can be
slightly slower, since ETS will no longer short-circuit out of
ets:match_delete when it finds a match while iterating over the list of
values for a specific key. But, this is only a small, linear slowdown
compared to the large, super linear speedup we get on index creation;
and as with the speedup, it only kicks in when you have lots of
duplicate values in an indexed column, so this seems like an obvious net
win.

It might be possible to extend this optimization to Mnesia bag tables,
but we'd need to make some additional changes to ensure we don't insert
duplicate values into the index. It's not clear to me whether that would
be doable without degrading performance in other less predictable ways.

Anyway, the changeset can be found here:

git fetch git://github.com/nickelization/otp.git mnesia_idx_insert_speedup

https://github.com/nickelization/otp/compare/erlang:maint...mnesia_idx_insert_speedup
https://github.com/nickelization/otp/compare/erlang:maint...mnesia_idx_insert_speedup.patch

This is my first ever contribution to Erlang/OTP, so I tried to follow
the patch submission guidelines carefully, but please let me know if
I've done anything incorrectly, or if you need anything else :-)

Thanks,
Nick

This e-mail and any attachments are confidential.  If it is not intended for you, please notify the sender, and please erase and ignore the contents.



More information about the erlang-patches mailing list