[erlang-questions] mnesia race condition in add_table_copy

karol skocik karol.skocik@REDACTED
Thu Jul 7 19:01:59 CEST 2011


Hi,
  I think I have found a race condition in mnesia:add_table_copy.
I am trying to add table copy, when new node appears in cluster (or
add table copy to another node, when the one having a copy fails), and
the number of copies is less than some required count.

The idea is simple, I spawn a new process on every node in cluster
first, and in these processes I want to create a global transaction
using global:trans with ID = {add_table_trans, table_name}.
The first process which grabbed the transaction lock, checks if more
table copies are required, and creates new copy on some node not
having one, when needed.
When the copy is created, this process exits, and another process on
different node gets the transaction lock and tries to do the same.

The problem here is, that the second process checks where are the
copies using mnesia:table_info(table_name, disc_copies), and this list
is sometimes incomplete, missing the very last node which got a table
copy in the first process.
It can be verified easily - in the second process:
Copies1 = mnesia:table_info(table_name, disc_copies),
timer:sleep(2000),
Copies2 = mnesia:table_info(table_name, disc_copies).

Then, mnesia:add_table_copy fails with
{aborted,{already_exists,table_name,LastAddedNode}}

Since the transaction lock ensures that no other process can add
another table copy, I guess this is a race condition where new table
copy node is not propagated to the schema on all nodes before the
function mnesia:add_table_copy returns.

Karol



More information about the erlang-questions mailing list