Mnesia deadlocks (?) while loading table

Igor Ribeiro Sucupira igorrs@REDACTED
Sun Jan 24 08:51:38 CET 2010


I have been able to reproduce this some times on a 64-bit Ubuntu, a
64-bit CentOS and a 32-bit Ubuntu, running R13B02.

Open 2 terminals and start one node on each:
erl -sname test1
erl -sname test2
Create the schema from the first node (substitute ijaba2 with your
server's name):
(test1@REDACTED)1> ok = mnesia:create_schema([node(), test2@REDACTED]),
mnesia:start().
Start Mnesia also on the second node:
(test2@REDACTED)1> mnesia:start().
Create a test table from the first node and start writing to it:
(test1@REDACTED)2> mnesia:create_table(test, [{disc_only_copies,
mnesia:system_info(running_db_nodes)}]).
(test1@REDACTED)3> W = fun(F, N) -> mnesia:sync_transaction(fun
mnesia:write/1, [{test, N, N}]), F(F, N + 1) end, W(W, 1).
Restart Mnesia on the second node and wait for the table to be loaded:
(test2@REDACTED)2> mnesia:stop(), ok = mnesia:start().
(test2@REDACTED)3> mnesia:wait_for_tables([test], infinity).


For some runs of this experiment, the table will never load.
It seems the current writer transaction on the first node is waiting
for a commit, while holding a write lock:
(test3@REDACTED)1> rpc:call(test1@REDACTED, mnesia, system_info, [held_locks]).
[{{test,14691},write,{tid,14696,<6217.41.0>}}]
(test3@REDACTED)3> rpc:call(test1@REDACTED, erlang, process_info,
[list_to_pid("<6217.41.0>"), current_function]).
{current_function,{mnesia_tm,rec_all,4}}

The second node is waiting for the table to be received
(test3@REDACTED)4> rpc:call(test2@REDACTED, mnesia, system_info,
[held_locks]).
[{{schema,test},read,{tid,14009,<6358.199.0>}}]
(test3@REDACTED)5> rpc:call(test2@REDACTED, erlang, process_info,
[list_to_pid("<6358.199.0>"), current_function]).
{current_function,{mnesia_loader,wait_on_load_complete,1}}

And there's a third transaction going on (the table sender?):
(test3@REDACTED)7> rpc:call(test1@REDACTED, mnesia, system_info, [transactions]).
[{14696,<6217.41.0>,coordinator},
 {14697,<6217.176.0>,coordinator}]
(test3@REDACTED)8> rpc:call(test1@REDACTED, erlang, process_info,
[list_to_pid("<6217.176.0>"), current_function]).
{current_function,{timer,sleep,1}}

None of the 3 transactions makes any progress, so maybe there's a
circular waiting here (deadlock).

I hope you can reproduce it easily. It seems to depend on whether the
first node notices test2's restart before or after test2 starts
loading the table. But maybe I'm wrong and there's another race
condition.

Igor.

-- 
"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.


More information about the erlang-bugs mailing list