Igor Ribeiro Sucupira <>
Tue Jun 7 08:19:36 CEST 2011


I was running a pool with Erlang/OTP R13B04 until the beginning of the
last week, when we upgraded to R14B02.

The reason we upgraded is that everyday we were experiencing a lot of
corruption in Mnesia's fragments (disc_only_copies). We had Erlang
processes checking the fragments periodically and, in case of
problems, we would delete the fragment (mnesia:del_table_copy) and
clone it again from its replica.
Given that Erlang/OTP R14B01 fixed a lot of concurrency issues with
dets, I believe those bugs were affecting us because we perform a lot
of dirty reads, what must cause more concurrent operations in dets

Anyway, since the upgrade to R14B02, no corrupted fragment has been
detected.  :-)

But then I observed that, after the upgrade, most of the servers are
performing much better (spending less time in I/O operations), while
some of them almost didn't change in that respect.

Taking a look at Mnesia's directory in each server, I noticed that the
fragment files in one table (uc) are smaller in the servers that are
performing better.

Example of a fragment in a "slower" server:
59M     uc_frag1004.DAT

Example of a fragment in a "faster" server:
34M     uc_frag598.DAT

Since uc is a bag, I thought it could be because uc_frag598, for
example, has less records than uc_frag1004. But I copied both to my
box and saw I was wrong:

1> {ok, F1004} = dets:open_file("uc_frag1004.DAT").
2> {ok, F598} = dets:open_file("uc_frag598.DAT").
3> dets:info(F1004, no_objects).
4> dets:info(F598, no_objects).
5> dets:info(F1004, no_keys).
6> dets:info(F598, no_keys).

The only (very) relevant difference I found between them was in the slots:

7> dets:info(F1004, no_slots).
8> dets:info(F598, no_slots).

The weirdest difference being that all records in uc_frag1004 are in
the same slot!

9> length([S || S <- lists:seq(0, element(2, dets:info(F1004,
no_slots)) - 1), length(dets:slot(F1004, S)) > 0]).
10> length([S || S <- lists:seq(0, element(2, dets:info(F598,
no_slots)) - 1), length(dets:slot(F598, S)) > 0]).

So... what may have happened and what can I do to fix it?

Thank you.

"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.

