[erlang-patches] Speed up index creation for Mnesia set tables
Wed Apr 24 20:15:58 CEST 2013
Thanks for your response. I looked into this, and it turned out I simply
failed to add a case clause for Mnesia tables of type ordered_set. I had
forgotten that ordered_set was a valid table type in Mnesia, and had
only been testing my changes with set and bag tables.
I've added a new commit to my GitHub repository to fix this, and the
unit test you cited now runs fine. As before, you can view all my
git fetch git://github.com/nickelization/otp.git mnesia_idx_insert_speedup
Or, if you'd prefer to just view the new patch I added to address the
problems with ordered_set tables, you can view it here:
On 04/18/2013 08:27 AM, Fredrik wrote:
> On 04/12/2013 10:34 PM, Nick Marino wrote:
>> I've developed a small, simple patch against the mnesia_index code that
>> can greatly speed up index creation under certain circumstances.
>> Specifically, this improves index creation on columns with a lot of
>> duplicate values in Mnesia tables of type 'set'.
>> Because Mnesia currently uses ETS bag tables to store its indexes, the
>> insert performance drops drastically when you have lots of duplicate
>> values in an indexed column, since for bag tables ETS has to check to
>> make sure it's not inserting any duplicate elements. However, for Mnesia
>> set tables we can use duplicate_bag tables instead: it will never insert
>> any duplicate values into the index anyway, since every insert into an
>> index is preceded by an explicit call to del_ixes for any entries in the
>> index that it's replacing (for reference, take a look at the add_index2
>> function in the mnesia_index module).
>> For bag tables, or for cases without lots of duplicate values in an
>> indexed column, this change won't make any appreciable difference. But
>> in a real world test for an application that hits this particular
>> situation, I've seen Mnesia startup times jump from 30+ minutes to 15
>> seconds, so it can provide some major improvements in the right
>> scenario. The one downside I've found is that deletion of indexes can be
>> slightly slower, since ETS will no longer short-circuit out of
>> ets:match_delete when it finds a match while iterating over the list of
>> values for a specific key. But, this is only a small, linear slowdown
>> compared to the large, super linear speedup we get on index creation;
>> and as with the speedup, it only kicks in when you have lots of
>> duplicate values in an indexed column, so this seems like an obvious net
>> It might be possible to extend this optimization to Mnesia bag tables,
>> but we'd need to make some additional changes to ensure we don't insert
>> duplicate values into the index. It's not clear to me whether that would
>> be doable without degrading performance in other less predictable ways.
>> Anyway, the changeset can be found here:
>> git fetch git://github.com/nickelization/otp.git
>> This is my first ever contribution to Erlang/OTP, so I tried to follow
>> the patch submission guidelines carefully, but please let me know if
>> I've done anything incorrectly, or if you need anything else :-)
>> This e-mail and any attachments are confidential. If it is not
>> intended for you, please notify the sender, and please erase and
>> ignore the contents.
>> erlang-patches mailing list
> Your patch seems to fail suite: mnesia_evil_coverage and testcase:
> Please run the suite and fix the problem.
This e-mail and any attachments are confidential. If it is not intended for you, please notify the sender, and please erase and ignore the contents.
More information about the erlang-patches