[erlang-patches] Speed up index creation for Mnesia set tables
Mon Apr 15 09:52:28 CEST 2013
On 04/12/2013 10:34 PM, Nick Marino wrote:
> I've developed a small, simple patch against the mnesia_index code that
> can greatly speed up index creation under certain circumstances.
> Specifically, this improves index creation on columns with a lot of
> duplicate values in Mnesia tables of type 'set'.
> Because Mnesia currently uses ETS bag tables to store its indexes, the
> insert performance drops drastically when you have lots of duplicate
> values in an indexed column, since for bag tables ETS has to check to
> make sure it's not inserting any duplicate elements. However, for Mnesia
> set tables we can use duplicate_bag tables instead: it will never insert
> any duplicate values into the index anyway, since every insert into an
> index is preceded by an explicit call to del_ixes for any entries in the
> index that it's replacing (for reference, take a look at the add_index2
> function in the mnesia_index module).
> For bag tables, or for cases without lots of duplicate values in an
> indexed column, this change won't make any appreciable difference. But
> in a real world test for an application that hits this particular
> situation, I've seen Mnesia startup times jump from 30+ minutes to 15
> seconds, so it can provide some major improvements in the right
> scenario. The one downside I've found is that deletion of indexes can be
> slightly slower, since ETS will no longer short-circuit out of
> ets:match_delete when it finds a match while iterating over the list of
> values for a specific key. But, this is only a small, linear slowdown
> compared to the large, super linear speedup we get on index creation;
> and as with the speedup, it only kicks in when you have lots of
> duplicate values in an indexed column, so this seems like an obvious net
> It might be possible to extend this optimization to Mnesia bag tables,
> but we'd need to make some additional changes to ensure we don't insert
> duplicate values into the index. It's not clear to me whether that would
> be doable without degrading performance in other less predictable ways.
> Anyway, the changeset can be found here:
> git fetch git://github.com/nickelization/otp.git
> This is my first ever contribution to Erlang/OTP, so I tried to follow
> the patch submission guidelines carefully, but please let me know if
> I've done anything incorrectly, or if you need anything else :-)
> This e-mail and any attachments are confidential. If it is not
> intended for you, please notify the sender, and please erase and
> ignore the contents.
> erlang-patches mailing list
Thanks for your contribution, a review process has started.
BR Fredrik Gustafsson
Erlang OTP Team
More information about the erlang-patches