[erlang-bugs] Re: Mnesia deadlocks (?) while loading table

Igor Ribeiro Sucupira igorrs@REDACTED
Tue Jan 26 00:55:05 CET 2010


Thank you. The problem seems to be fixed with the patch (and I've also
verified on the original system where I detected the problem).

So, is this patch going to be in the next release?

Best regards.
Igor.

On Mon, Jan 25, 2010 at 11:00 AM, Dan Gudmundsson <dgud@REDACTED> wrote:
> Here is a patch:
>
> diff --git a/lib/mnesia/src/mnesia_tm.erl b/lib/mnesia/src/mnesia_tm.erl
> index 3f3a10a..5a2407d 100644
> --- a/lib/mnesia/src/mnesia_tm.erl
> +++ b/lib/mnesia/src/mnesia_tm.erl
> @@ -1388,7 +1388,9 @@ multi_commit(sync_sym_trans, Tid, CR, Store) ->
>     {WaitFor, Local} = ask_commit(sync_sym_trans, Tid, CR, DiscNs, RamNs),
>     {Outcome, []} = rec_all(WaitFor, Tid, do_commit, []),
>     ?eval_debug_fun({?MODULE, multi_commit_sym_sync},
> -                   [{tid, Tid}, {outcome, Outcome}]),
> +                   [{tid, Tid}, {outcome, Outcome}]),
> +    [?ets_insert(Store, {waiting_for_commit_ack, Node}) ||
> +       Node <- WaitFor],
>     rpc:abcast(DiscNs -- [node()], ?MODULE, {Tid, Outcome}),
>     rpc:abcast(RamNs -- [node()], ?MODULE, {Tid, Outcome}),
>     case Outcome of
>
> /Dan
>
> On Mon, Jan 25, 2010 at 8:39 AM, Dan Gudmundsson <dangud@REDACTED> wrote:
>> Thanks, I'll have a look at it.
>> I'm on a hunt for a deadlock I can't reproduce...
>>
>> /Dan
>>
>> On Sun, Jan 24, 2010 at 5:48 PM, Igor Ribeiro Sucupira <igorrs@REDACTED> wrote:
>>> It's not so easy as I thought to reproduce the problem with only one table.
>>> I am attaching code that creates more tables and executes transactions
>>> with two of them. This is not a necessary condition to reproduce the
>>> problem, but it helps a lot.
>>>
>>> Open 2 terminals and start one node on each:
>>> erl -sname test1
>>> erl -sname test2
>>> Create everything from the first node (substitute igorrs with your
>>> server's name):
>>> (test1@REDACTED)1> load_dl:start(test2@REDACTED).
>>> Run this restart loop on second node:
>>> (test2@REDACTED)1> load_dl:restart_forever().
>>>
>>> At some point, the second node will stop printing and the last message
>>> will be "Started. Waiting for tables." It will be hung forever, not
>>> being able to load the tables. At the same time, the current
>>> transaction on the first node will also be hung forever (I got some
>>> info - see below - by running some RPCs from a third node).
>>>
>>> Igor.
>>>
>>> On Sun, Jan 24, 2010 at 5:51 AM, Igor Ribeiro Sucupira <igorrs@REDACTED> wrote:
>>>> I have been able to reproduce this some times on a 64-bit Ubuntu, a
>>>> 64-bit CentOS and a 32-bit Ubuntu, running R13B02.
>>>>
>>>> Open 2 terminals and start one node on each:
>>>> erl -sname test1
>>>> erl -sname test2
>>>> Create the schema from the first node (substitute ijaba2 with your
>>>> server's name):
>>>> (test1@REDACTED)1> ok = mnesia:create_schema([node(), test2@REDACTED]),
>>>> mnesia:start().
>>>> Start Mnesia also on the second node:
>>>> (test2@REDACTED)1> mnesia:start().
>>>> Create a test table from the first node and start writing to it:
>>>> (test1@REDACTED)2> mnesia:create_table(test, [{disc_only_copies,
>>>> mnesia:system_info(running_db_nodes)}]).
>>>> (test1@REDACTED)3> W = fun(F, N) -> mnesia:sync_transaction(fun
>>>> mnesia:write/1, [{test, N, N}]), F(F, N + 1) end, W(W, 1).
>>>> Restart Mnesia on the second node and wait for the table to be loaded:
>>>> (test2@REDACTED)2> mnesia:stop(), ok = mnesia:start().
>>>> (test2@REDACTED)3> mnesia:wait_for_tables([test], infinity).
>>>>
>>>>
>>>> For some runs of this experiment, the table will never load.
>>>> It seems the current writer transaction on the first node is waiting
>>>> for a commit, while holding a write lock:
>>>> (test3@REDACTED)1> rpc:call(test1@REDACTED, mnesia, system_info, [held_locks]).
>>>> [{{test,14691},write,{tid,14696,<6217.41.0>}}]
>>>> (test3@REDACTED)3> rpc:call(test1@REDACTED, erlang, process_info,
>>>> [list_to_pid("<6217.41.0>"), current_function]).
>>>> {current_function,{mnesia_tm,rec_all,4}}
>>>>
>>>> The second node is waiting for the table to be received
>>>> (test3@REDACTED)4> rpc:call(test2@REDACTED, mnesia, system_info,
>>>> [held_locks]).
>>>> [{{schema,test},read,{tid,14009,<6358.199.0>}}]
>>>> (test3@REDACTED)5> rpc:call(test2@REDACTED, erlang, process_info,
>>>> [list_to_pid("<6358.199.0>"), current_function]).
>>>> {current_function,{mnesia_loader,wait_on_load_complete,1}}
>>>>
>>>> And there's a third transaction going on (the table sender?):
>>>> (test3@REDACTED)7> rpc:call(test1@REDACTED, mnesia, system_info, [transactions]).
>>>> [{14696,<6217.41.0>,coordinator},
>>>>  {14697,<6217.176.0>,coordinator}]
>>>> (test3@REDACTED)8> rpc:call(test1@REDACTED, erlang, process_info,
>>>> [list_to_pid("<6217.176.0>"), current_function]).
>>>> {current_function,{timer,sleep,1}}
>>>>
>>>> None of the 3 transactions makes any progress, so maybe there's a
>>>> circular waiting here (deadlock).
>>>>
>>>> I hope you can reproduce it easily. It seems to depend on whether the
>>>> first node notices test2's restart before or after test2 starts
>>>> loading the table. But maybe I'm wrong and there's another race
>>>> condition.
>>>>
>>>> Igor.
>>>>
>>>> --
>>>> "The secret of joy in work is contained in one word - excellence. To
>>>> know how to do something well is to enjoy it." - Pearl S. Buck.
>>>>
>>>
>>>
>>>
>>> --
>>> "The secret of joy in work is contained in one word - excellence. To
>>> know how to do something well is to enjoy it." - Pearl S. Buck.
>>>
>>>
>>> ________________________________________________________________
>>> erlang-bugs mailing list. See http://www.erlang.org/faq.html
>>> erlang-bugs (at) erlang.org
>>>
>>
>



-- 
"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.


More information about the erlang-bugs mailing list