[erlang-patches] Speed up index creation for Mnesia set tables

Wed Apr 24 20:15:58 CEST 2013

Hi Fredrik,

Thanks for your response. I looked into this, and it turned out I simply
failed to add a case clause for Mnesia tables of type ordered_set. I had
forgotten that ordered_set was a valid table type in Mnesia, and had
only been testing my changes with set and bag tables.

I've added a new commit to my GitHub repository to fix this, and the
unit test you cited now runs fine. As before, you can view all my
changes here:

git fetch git://github.com/nickelization/otp.git mnesia_idx_insert_speedup
https://github.com/nickelization/otp/compare/erlang:maint...mnesia_idx_insert_speedup

https://github.com/nickelization/otp/compare/erlang:maint...mnesia_idx_insert_speedup.patch

Or, if you'd prefer to just view the new patch I added to address the
problems with ordered_set tables, you can view it here:

https://github.com/nickelization/otp/commit/fb8cbfa5db65f4bb9dcb133de9414c78aa49d9e7
https://github.com/nickelization/otp/commit/fb8cbfa5db65f4bb9dcb133de9414c78aa49d9e7.patch

Thanks again,
Nick

On 04/18/2013 08:27 AM, Fredrik wrote:
> On 04/12/2013 10:34 PM, Nick Marino wrote:
>> Hello,
>>
>> I've developed a small, simple patch against the mnesia_index code that
>> can greatly speed up index creation under certain circumstances.
>> Specifically, this improves index creation on columns with a lot of
>> duplicate values in Mnesia tables of type 'set'.
>>
>> Because Mnesia currently uses ETS bag tables to store its indexes, the
>> insert performance drops drastically when you have lots of duplicate
>> values in an indexed column, since for bag tables ETS has to check to
>> make sure it's not inserting any duplicate elements. However, for Mnesia
>> set tables we can use duplicate_bag tables instead: it will never insert
>> any duplicate values into the index anyway, since every insert into an
>> index is preceded by an explicit call to del_ixes for any entries in the
>> index that it's replacing (for reference, take a look at the add_index2
>> function in the mnesia_index module).
>>
>> For bag tables, or for cases without lots of duplicate values in an
>> indexed column, this change won't make any appreciable difference. But
>> in a real world test for an application that hits this particular
>> situation, I've seen Mnesia startup times jump from 30+ minutes to 15
>> seconds, so it can provide some major improvements in the right
>> scenario. The one downside I've found is that deletion of indexes can be
>> slightly slower, since ETS will no longer short-circuit out of
>> ets:match_delete when it finds a match while iterating over the list of
>> values for a specific key. But, this is only a small, linear slowdown
>> compared to the large, super linear speedup we get on index creation;
>> and as with the speedup, it only kicks in when you have lots of
>> duplicate values in an indexed column, so this seems like an obvious net
>> win.
>>
>> It might be possible to extend this optimization to Mnesia bag tables,
>> but we'd need to make some additional changes to ensure we don't insert
>> duplicate values into the index. It's not clear to me whether that would
>> be doable without degrading performance in other less predictable ways.
>>
>> Anyway, the changeset can be found here:
>>
>> git fetch git://github.com/nickelization/otp.git
>> mnesia_idx_insert_speedup
>>
>> https://github.com/nickelization/otp/compare/erlang:maint...mnesia_idx_insert_speedup
>>
>> https://github.com/nickelization/otp/compare/erlang:maint...mnesia_idx_insert_speedup.patch
>>
>>
>> This is my first ever contribution to Erlang/OTP, so I tried to follow
>> the patch submission guidelines carefully, but please let me know if
>> I've done anything incorrectly, or if you need anything else :-)
>>
>> Thanks,
>> Nick
>>
>> This e-mail and any attachments are confidential.  If it is not
>> intended for you, please notify the sender, and please erase and
>> ignore the contents.
>> _______________________________________________
>> erlang-patches mailing list
>> erlang-patches@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-patches
> Hello,
> Your patch seems to fail suite: mnesia_evil_coverage and testcase:
> sorted_ets.
> Please run the suite and fix the problem.
>

This e-mail and any attachments are confidential.  If it is not intended for you, please notify the sender, and please erase and ignore the contents.