[erlang-patches] Speed up index creation for Mnesia set tables

Fredrik fredrik@REDACTED
Thu Apr 18 14:27:06 CEST 2013


On 04/12/2013 10:34 PM, Nick Marino wrote:
> Hello,
>
> I've developed a small, simple patch against the mnesia_index code that
> can greatly speed up index creation under certain circumstances.
> Specifically, this improves index creation on columns with a lot of
> duplicate values in Mnesia tables of type 'set'.
>
> Because Mnesia currently uses ETS bag tables to store its indexes, the
> insert performance drops drastically when you have lots of duplicate
> values in an indexed column, since for bag tables ETS has to check to
> make sure it's not inserting any duplicate elements. However, for Mnesia
> set tables we can use duplicate_bag tables instead: it will never insert
> any duplicate values into the index anyway, since every insert into an
> index is preceded by an explicit call to del_ixes for any entries in the
> index that it's replacing (for reference, take a look at the add_index2
> function in the mnesia_index module).
>
> For bag tables, or for cases without lots of duplicate values in an
> indexed column, this change won't make any appreciable difference. But
> in a real world test for an application that hits this particular
> situation, I've seen Mnesia startup times jump from 30+ minutes to 15
> seconds, so it can provide some major improvements in the right
> scenario. The one downside I've found is that deletion of indexes can be
> slightly slower, since ETS will no longer short-circuit out of
> ets:match_delete when it finds a match while iterating over the list of
> values for a specific key. But, this is only a small, linear slowdown
> compared to the large, super linear speedup we get on index creation;
> and as with the speedup, it only kicks in when you have lots of
> duplicate values in an indexed column, so this seems like an obvious net
> win.
>
> It might be possible to extend this optimization to Mnesia bag tables,
> but we'd need to make some additional changes to ensure we don't insert
> duplicate values into the index. It's not clear to me whether that would
> be doable without degrading performance in other less predictable ways.
>
> Anyway, the changeset can be found here:
>
> git fetch git://github.com/nickelization/otp.git 
> mnesia_idx_insert_speedup
>
> https://github.com/nickelization/otp/compare/erlang:maint...mnesia_idx_insert_speedup 
>
> https://github.com/nickelization/otp/compare/erlang:maint...mnesia_idx_insert_speedup.patch 
>
>
> This is my first ever contribution to Erlang/OTP, so I tried to follow
> the patch submission guidelines carefully, but please let me know if
> I've done anything incorrectly, or if you need anything else :-)
>
> Thanks,
> Nick
>
> This e-mail and any attachments are confidential.  If it is not 
> intended for you, please notify the sender, and please erase and 
> ignore the contents.
> _______________________________________________
> erlang-patches mailing list
> erlang-patches@REDACTED
> http://erlang.org/mailman/listinfo/erlang-patches
Hello,
Your patch seems to fail suite: mnesia_evil_coverage and testcase: 
sorted_ets.
Please run the suite and fix the problem.

-- 

BR Fredrik Gustafsson
Erlang OTP Team




More information about the erlang-patches mailing list