[erlang-questions] Lost Mnesia Fragments

Fri Jun 12 03:29:24 CEST 2009

I have on rare occasion run into this problem.  Basically, mnesia cannot
load the table because there are no active replicas available, so you
get stuck until you make a copy available.  You can find problem tables
like this:

[ L || T <- mnesia:system_info (tables), { timeout, [ L ] }  <- [ mnesia:wait_for_tables ([ T ], 1) ] ].

(if your mnesia is booting up, this will return alot of false positives,
but if it's been running a while, this command should return the empty
list if things are hunky dory).

The best option is to get the table fragment back up by getting the node
it was on back up.  If that's not an option ...

I think for ram tables you could optimistically try to add another
table copy on another node and see if that works.   I think for non-ram
tables the rules are different.  My memory is unclear on that, we use
disk (tcerl) tables alot so in the rare time when this happens I do evil
under the hood stuff to make it work.  I checked my wiki page of mnesia
hacks and saw this

"Example, cloning frag6 to frag7 because frag7 was nuked. "

(fun (B, From, To) -> mnesia_schema:schema_transaction (fun () -> mnesia_schema:do_create_table (erlang:setelement (2, mnesia:table_info (list_to_atom (atom_to_list (B) ++ "_frag" ++ integer_to_list (From)), cstruct), list_to_atom (atom_to_list (B) ++ "_frag" ++ integer_to_list (To)))) end) end) (clickdb, 6, 7).

YMMV.  Scary?  yes.  Also it is hopefully obvious that you are not
recovering the data from that fragment with this operation, just getting
the schema happy.

If you have more table copies, then you are less likely to get into this
mess in the first place.  The config you have below has n_disc_copies and
n_disc_only_copies set to zero, so you are defaulting to { n_ram_copies,
1 }, which means you have no rendundancy.

Cheers,

-- p

On Thu, 11 Jun 2009, Evans, Matthew wrote:

> Hello,
>
> I sent this to Erlang bugs, not sure if that was appropriate.
>
> I have a somewhat unusual issue with fragmented Mnesia tables.
>
> I have a mesh of 6 nodes, and have created 24 fragments over those nodes using the command:
>
> mnesia:create_table(index_data, [{frag_properties,[{node_pool, NodeList},{n_fragments,24},{n_disc_copies,0}, {n_disc_only_copies,0}]},{index,[type]},{attributes, record_info(fields, index_data)},{type, bag}]).
>
> As part of a test protocol I wanted to test what happens if I kill a node. To do this I physically unplug the node.
> What I see is that when I do a mnesia:info() is the fragments that were on that node are in this state:
>
> [] = [index_data_frag21,index_data_frag15,index_data_frag9,index_data_frag3]
>
> Of course, I can not read data from those fragments.
>
> However I can not seem to delete those fragments either.
>
> Worse still, when I try to insert a record the hashing function attempts to pick up one of those “lost” fragments and the insert aborts with error:
>
> {aborted,{no_exists,index_data_frag11}}
>
> This insert is dirty, using the code:
>
>     AddFun = fun() -> mnesia:write(index_data, #index_data{asset = Asset, npt = Npt, type = Type, inpoint = Inpoint, data = Data}, write) end,
>     mnesia:activity(async_dirty, AddFun, [], mnesia_frag).
>
> Am I missing something?
>
> I there was a way to remove the "bad" fragment, I could catch the error and remove it by hand.
>
> Thanks
>
> Matt
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>