Mnesia deadlocks (?) while loading table

Igor Ribeiro Sucupira igorrs@REDACTED
Sun Jan 24 17:48:41 CET 2010


It's not so easy as I thought to reproduce the problem with only one table.
I am attaching code that creates more tables and executes transactions
with two of them. This is not a necessary condition to reproduce the
problem, but it helps a lot.

Open 2 terminals and start one node on each:
erl -sname test1
erl -sname test2
Create everything from the first node (substitute igorrs with your
server's name):
(test1@REDACTED)1> load_dl:start(test2@REDACTED).
Run this restart loop on second node:
(test2@REDACTED)1> load_dl:restart_forever().

At some point, the second node will stop printing and the last message
will be "Started. Waiting for tables." It will be hung forever, not
being able to load the tables. At the same time, the current
transaction on the first node will also be hung forever (I got some
info - see below - by running some RPCs from a third node).

Igor.

On Sun, Jan 24, 2010 at 5:51 AM, Igor Ribeiro Sucupira <igorrs@REDACTED> wrote:
> I have been able to reproduce this some times on a 64-bit Ubuntu, a
> 64-bit CentOS and a 32-bit Ubuntu, running R13B02.
>
> Open 2 terminals and start one node on each:
> erl -sname test1
> erl -sname test2
> Create the schema from the first node (substitute ijaba2 with your
> server's name):
> (test1@REDACTED)1> ok = mnesia:create_schema([node(), test2@REDACTED]),
> mnesia:start().
> Start Mnesia also on the second node:
> (test2@REDACTED)1> mnesia:start().
> Create a test table from the first node and start writing to it:
> (test1@REDACTED)2> mnesia:create_table(test, [{disc_only_copies,
> mnesia:system_info(running_db_nodes)}]).
> (test1@REDACTED)3> W = fun(F, N) -> mnesia:sync_transaction(fun
> mnesia:write/1, [{test, N, N}]), F(F, N + 1) end, W(W, 1).
> Restart Mnesia on the second node and wait for the table to be loaded:
> (test2@REDACTED)2> mnesia:stop(), ok = mnesia:start().
> (test2@REDACTED)3> mnesia:wait_for_tables([test], infinity).
>
>
> For some runs of this experiment, the table will never load.
> It seems the current writer transaction on the first node is waiting
> for a commit, while holding a write lock:
> (test3@REDACTED)1> rpc:call(test1@REDACTED, mnesia, system_info, [held_locks]).
> [{{test,14691},write,{tid,14696,<6217.41.0>}}]
> (test3@REDACTED)3> rpc:call(test1@REDACTED, erlang, process_info,
> [list_to_pid("<6217.41.0>"), current_function]).
> {current_function,{mnesia_tm,rec_all,4}}
>
> The second node is waiting for the table to be received
> (test3@REDACTED)4> rpc:call(test2@REDACTED, mnesia, system_info,
> [held_locks]).
> [{{schema,test},read,{tid,14009,<6358.199.0>}}]
> (test3@REDACTED)5> rpc:call(test2@REDACTED, erlang, process_info,
> [list_to_pid("<6358.199.0>"), current_function]).
> {current_function,{mnesia_loader,wait_on_load_complete,1}}
>
> And there's a third transaction going on (the table sender?):
> (test3@REDACTED)7> rpc:call(test1@REDACTED, mnesia, system_info, [transactions]).
> [{14696,<6217.41.0>,coordinator},
>  {14697,<6217.176.0>,coordinator}]
> (test3@REDACTED)8> rpc:call(test1@REDACTED, erlang, process_info,
> [list_to_pid("<6217.176.0>"), current_function]).
> {current_function,{timer,sleep,1}}
>
> None of the 3 transactions makes any progress, so maybe there's a
> circular waiting here (deadlock).
>
> I hope you can reproduce it easily. It seems to depend on whether the
> first node notices test2's restart before or after test2 starts
> loading the table. But maybe I'm wrong and there's another race
> condition.
>
> Igor.
>
> --
> "The secret of joy in work is contained in one word - excellence. To
> know how to do something well is to enjoy it." - Pearl S. Buck.
>



-- 
"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: load_dl.erl
Type: text/x-erlang
Size: 1226 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20100124/96a30e48/attachment.bin>


More information about the erlang-bugs mailing list