[erlang-questions] Mnesia Fragmentation, duplicated records after rehashing

Fri Oct 21 22:45:23 CEST 2011

This may be a suggestion that doesn't work for you, but if you need
fragmentation (sharding) and adding/removing nodes in real time, have you
looked at using a higher-level system like Riak?

Sincerely,

jw

--
Americans might object: there is no way we would sacrifice our living
standards for the benefit of people in the rest of the world. Nevertheless,
whether we get there willingly or not, we shall soon have lower consumption
rates, because our present rates are unsustainable.

On Tue, Oct 18, 2011 at 5:25 AM, TexTonPC <textonpc@REDACTED> wrote:

> Hi,
>
> we are encountering a strange scenario using mnesia fragmentation in our
> production system:
> our cluster had around 20 tables spread over 8 mnesia nodes each running on
> a single server, totalling 1024 frags per table (128 frags per node).
>
> Now we added 8 new machines to the cloud, and started the rehashing process
> by adding other 128 frags per table on each new node.
> I started this process from a different host in the cluster (lot of free
> ram space) attached to the mnesia cluster calling
> mnesia:change_table_frag(Table, {add_frag, [NewNode]}) for each table in
> order to have 2048 frags per table spread over 16 nodes.
>
> 1. The adding_fragments process took a week to rehash all the table records
> while working on a single core of this "maintenance" node. I read on the
> mnesia docs and on this list that this kind of op locks the involved table,
> but I was not able to parallelize on the different tables (parallel
> processes each running add_frag on different table) in order to take
> advantage of multiple cores. I had the feeling that add_frag "locks" the
> entire mnesia transaction manager. Any perspectives or advice on this would
> be greatly appreciated.
>
> 2. At the end of the frags-creation and rehashing process I noticed some
> size unbalancing between old and new frags so I started a consistency
> scanner that simply takes each record on each fragment and ensures that the
> mnesia_frag hashing module actually maps that record on that specific
> fragment. It turns out that the unbalanced frags have some records that were
> moved to the new destination frag during the rehashing process, but were not
> removed from the old source frag! I thought mnesia:change_table_frag(Table,
> {add_frag, [NewNode]}) was running in atomic transaction context, has anyone
> ever faced with something like this?
>
> Thank you
>
> --
> textonpc@REDACTED
> atessaro@REDACTED
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111021/a60dcb24/attachment.htm>