[erlang-questions] Mnesia Fragmentation, duplicated records after rehashing

TexTonPC textonpc@REDACTED
Tue Oct 18 14:25:46 CEST 2011


we are encountering a strange scenario using mnesia fragmentation in our
production system:
our cluster had around 20 tables spread over 8 mnesia nodes each running on
a single server, totalling 1024 frags per table (128 frags per node).

Now we added 8 new machines to the cloud, and started the rehashing process
by adding other 128 frags per table on each new node.
I started this process from a different host in the cluster (lot of free ram
space) attached to the mnesia cluster calling
mnesia:change_table_frag(Table, {add_frag, [NewNode]}) for each table in
order to have 2048 frags per table spread over 16 nodes.

1. The adding_fragments process took a week to rehash all the table records
while working on a single core of this "maintenance" node. I read on the
mnesia docs and on this list that this kind of op locks the involved table,
but I was not able to parallelize on the different tables (parallel
processes each running add_frag on different table) in order to take
advantage of multiple cores. I had the feeling that add_frag "locks" the
entire mnesia transaction manager. Any perspectives or advice on this would
be greatly appreciated.

2. At the end of the frags-creation and rehashing process I noticed some
size unbalancing between old and new frags so I started a consistency
scanner that simply takes each record on each fragment and ensures that the
mnesia_frag hashing module actually maps that record on that specific
fragment. It turns out that the unbalanced frags have some records that were
moved to the new destination frag during the rehashing process, but were not
removed from the old source frag! I thought mnesia:change_table_frag(Table,
{add_frag, [NewNode]}) was running in atomic transaction context, has anyone
ever faced with something like this?

Thank you

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111018/4cd89d6f/attachment.htm>

More information about the erlang-questions mailing list