[erlang-bugs] Re: Mnesia deadlocks (?) while loading table

Dan Gudmundsson dgud@REDACTED
Tue Jan 26 08:37:44 CET 2010


Yes it will be applied.

/Dan

On Tue, Jan 26, 2010 at 12:55 AM, Igor Ribeiro Sucupira
<igorrs@REDACTED> wrote:
> Thank you. The problem seems to be fixed with the patch (and I've also
> verified on the original system where I detected the problem).
>
> So, is this patch going to be in the next release?
>
> Best regards.
> Igor.
>
> On Mon, Jan 25, 2010 at 11:00 AM, Dan Gudmundsson <dgud@REDACTED> wrote:
>> Here is a patch:
>>
>> diff --git a/lib/mnesia/src/mnesia_tm.erl b/lib/mnesia/src/mnesia_tm.erl
>> index 3f3a10a..5a2407d 100644
>> --- a/lib/mnesia/src/mnesia_tm.erl
>> +++ b/lib/mnesia/src/mnesia_tm.erl
>> @@ -1388,7 +1388,9 @@ multi_commit(sync_sym_trans, Tid, CR, Store) ->
>>     {WaitFor, Local} = ask_commit(sync_sym_trans, Tid, CR, DiscNs, RamNs),
>>     {Outcome, []} = rec_all(WaitFor, Tid, do_commit, []),
>>     ?eval_debug_fun({?MODULE, multi_commit_sym_sync},
>> -                   [{tid, Tid}, {outcome, Outcome}]),
>> +                   [{tid, Tid}, {outcome, Outcome}]),
>> +    [?ets_insert(Store, {waiting_for_commit_ack, Node}) ||
>> +       Node <- WaitFor],
>>     rpc:abcast(DiscNs -- [node()], ?MODULE, {Tid, Outcome}),
>>     rpc:abcast(RamNs -- [node()], ?MODULE, {Tid, Outcome}),
>>     case Outcome of
>>
>> /Dan
>>
>> On Mon, Jan 25, 2010 at 8:39 AM, Dan Gudmundsson <dangud@REDACTED> wrote:
>>> Thanks, I'll have a look at it.
>>> I'm on a hunt for a deadlock I can't reproduce...
>>>
>>> /Dan
>>>
>>> On Sun, Jan 24, 2010 at 5:48 PM, Igor Ribeiro Sucupira <igorrs@REDACTED> wrote:
>>>> It's not so easy as I thought to reproduce the problem with only one table.
>>>> I am attaching code that creates more tables and executes transactions
>>>> with two of them. This is not a necessary condition to reproduce the
>>>> problem, but it helps a lot.
>>>>
>>>> Open 2 terminals and start one node on each:
>>>> erl -sname test1
>>>> erl -sname test2
>>>> Create everything from the first node (substitute igorrs with your
>>>> server's name):
>>>> (test1@REDACTED)1> load_dl:start(test2@REDACTED).
>>>> Run this restart loop on second node:
>>>> (test2@REDACTED)1> load_dl:restart_forever().
>>>>
>>>> At some point, the second node will stop printing and the last message
>>>> will be "Started. Waiting for tables." It will be hung forever, not
>>>> being able to load the tables. At the same time, the current
>>>> transaction on the first node will also be hung forever (I got some
>>>> info - see below - by running some RPCs from a third node).
>>>>
>>>> Igor.
>>>>
>>>> On Sun, Jan 24, 2010 at 5:51 AM, Igor Ribeiro Sucupira <igorrs@REDACTED> wrote:
>>>>> I have been able to reproduce this some times on a 64-bit Ubuntu, a
>>>>> 64-bit CentOS and a 32-bit Ubuntu, running R13B02.
>>>>>
>>>>> Open 2 terminals and start one node on each:
>>>>> erl -sname test1
>>>>> erl -sname test2
>>>>> Create the schema from the first node (substitute ijaba2 with your
>>>>> server's name):
>>>>> (test1@REDACTED)1> ok = mnesia:create_schema([node(), test2@REDACTED]),
>>>>> mnesia:start().
>>>>> Start Mnesia also on the second node:
>>>>> (test2@REDACTED)1> mnesia:start().
>>>>> Create a test table from the first node and start writing to it:
>>>>> (test1@REDACTED)2> mnesia:create_table(test, [{disc_only_copies,
>>>>> mnesia:system_info(running_db_nodes)}]).
>>>>> (test1@REDACTED)3> W = fun(F, N) -> mnesia:sync_transaction(fun
>>>>> mnesia:write/1, [{test, N, N}]), F(F, N + 1) end, W(W, 1).
>>>>> Restart Mnesia on the second node and wait for the table to be loaded:
>>>>> (test2@REDACTED)2> mnesia:stop(), ok = mnesia:start().
>>>>> (test2@REDACTED)3> mnesia:wait_for_tables([test], infinity).
>>>>>
>>>>>
>>>>> For some runs of this experiment, the table will never load.
>>>>> It seems the current writer transaction on the first node is waiting
>>>>> for a commit, while holding a write lock:
>>>>> (test3@REDACTED)1> rpc:call(test1@REDACTED, mnesia, system_info, [held_locks]).
>>>>> [{{test,14691},write,{tid,14696,<6217.41.0>}}]
>>>>> (test3@REDACTED)3> rpc:call(test1@REDACTED, erlang, process_info,
>>>>> [list_to_pid("<6217.41.0>"), current_function]).
>>>>> {current_function,{mnesia_tm,rec_all,4}}
>>>>>
>>>>> The second node is waiting for the table to be received
>>>>> (test3@REDACTED)4> rpc:call(test2@REDACTED, mnesia, system_info,
>>>>> [held_locks]).
>>>>> [{{schema,test},read,{tid,14009,<6358.199.0>}}]
>>>>> (test3@REDACTED)5> rpc:call(test2@REDACTED, erlang, process_info,
>>>>> [list_to_pid("<6358.199.0>"), current_function]).
>>>>> {current_function,{mnesia_loader,wait_on_load_complete,1}}
>>>>>
>>>>> And there's a third transaction going on (the table sender?):
>>>>> (test3@REDACTED)7> rpc:call(test1@REDACTED, mnesia, system_info, [transactions]).
>>>>> [{14696,<6217.41.0>,coordinator},
>>>>>  {14697,<6217.176.0>,coordinator}]
>>>>> (test3@REDACTED)8> rpc:call(test1@REDACTED, erlang, process_info,
>>>>> [list_to_pid("<6217.176.0>"), current_function]).
>>>>> {current_function,{timer,sleep,1}}
>>>>>
>>>>> None of the 3 transactions makes any progress, so maybe there's a
>>>>> circular waiting here (deadlock).
>>>>>
>>>>> I hope you can reproduce it easily. It seems to depend on whether the
>>>>> first node notices test2's restart before or after test2 starts
>>>>> loading the table. But maybe I'm wrong and there's another race
>>>>> condition.
>>>>>
>>>>> Igor.
>>>>>
>>>>> --
>>>>> "The secret of joy in work is contained in one word - excellence. To
>>>>> know how to do something well is to enjoy it." - Pearl S. Buck.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> "The secret of joy in work is contained in one word - excellence. To
>>>> know how to do something well is to enjoy it." - Pearl S. Buck.
>>>>
>>>>
>>>> ________________________________________________________________
>>>> erlang-bugs mailing list. See http://www.erlang.org/faq.html
>>>> erlang-bugs (at) erlang.org
>>>>
>>>
>>
>
>
>
> --
> "The secret of joy in work is contained in one word - excellence. To
> know how to do something well is to enjoy it." - Pearl S. Buck.
>
> ________________________________________________________________
> erlang-bugs mailing list. See http://www.erlang.org/faq.html
> erlang-bugs (at) erlang.org
>
>


More information about the erlang-bugs mailing list