<div dir="ltr">I changed the fix to synchronize the lock clearing from mnesia_tm.<div><br></div><div>See <a href="https://github.com/dgud/otp/compare/dgud;mnesia;sticky-race;OTP-11375">https://github.com/dgud/otp/compare/dgud;mnesia;sticky-race;OTP-11375</a></div>

<div><br></div><div>Does that work for you?</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Oct 4, 2013 at 2:59 PM, Henrik Nord <span dir="ltr"><<a href="mailto:henrik@erlang.org" target="_blank">henrik@erlang.org</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi<br>

<br>

Thank you for your contribution!<br>

I have created a pull request from this, (This is becoming the prefered way of submitting patches to Erlang/otp) and the team will take a look at your patch.<br>

<br>

/Henrik<div class="HOEnZb"><div class="h5"><br>

<br>

On 2013-10-04 08:52, Jan Chochol wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi OTP team,<br>

<br>

We found found bug (race condition) in mnesia code regarding<br>

disconnecting nodes owning sticky lock.<br>

Our (simplified) setup consists of two nodes, when first node acts as<br>

active one and second acts as standby.<br>

It means that all requests are normally going through first node and<br>

second node is only used after fail of the first. Mnesia in our<br>

scenario is used as HA data store.<br>

For this setup it seems that sticky locks are ideal solution - all<br>

operations are performed only on one node.<br>

Everything worked well, until we found that some times transactions<br>

started just after disconnecting first node hangs indefinitely.<br>

This bogus behaviour was very rare, so it seemed that there is some<br>

race condition how sticky locks are handled after disconnection of<br>

node.<br>

<br>

We found that there is problem in synchronization of transaction<br>

process (process initializing transaction) and mnesia_locker process.<br>

It can happen that mnesia_locker process respond to lock request with<br>

{switch, N, Req} message, but node N is already disconnected.<br>

<br>

Part of message trace to show the problem:<br>

<br>

List of interesting PIDs:<br>

<br>

mnesia_controller="<0.93.0>"<br>

mnesia_tm="<0.89.0>"<br>

mnesia_monitor="<0.85.0>"<br>

mnesia_locker="<0.87.0>"<br>

mnesia_recover="<0.88.0>"<br>

transaction process="<0.206.0>"<br>

<br>

Message trace:<br>

<br>

% mnesia_monitor lost track of mnesia_monitor at node 1<br>

(<0.85.0>) << {'EXIT',<5184.92.0>,<u></u>noconnection}<br>

<br>

% mnesia_monitor receives nodedown - this message is ignored<br>

(<0.85.0>) << {nodedown,'gcfc-triple1_<u></u>519129@localhost'}<br>

<br>

% mnesia_monitor call mnesia_recover:mnesia_down()<br>

(<0.85.0>) <0.88.0> ! {'$gen_cast',<br>

{mnesia_down,'gcfc-triple1_<u></u>519129@localhost'}}<br>

<br>

% mnesia_recover receives mnesia_down, nothing to do there<br>

(<0.88.0>) << {'$gen_cast',{mnesia_down,'<u></u>gcfc-triple1_519129@localhost'<u></u>}}<br>

<br>

% mnesia_monitor call mnesia_controller:mnesia_down<br>

(<0.85.0>) <0.93.0> ! {'$gen_cast',<br>

{mnesia_down,'gcfc-triple1_<u></u>519129@localhost'}}<br>

<br>

% mnesia_controller receives mnesia_down<br>

(<0.93.0>) << {'$gen_cast',{mnesia_down,'<u></u>gcfc-triple1_519129@localhost'<u></u>}}<br>

<br>

% mnesia_controller calls mnesia_recover:log_mnesia_<u></u>down()<br>

(<0.93.0>) <0.88.0> ! {'$gen_call', {<0.93.0>,#Ref<0.0.0.1860>},<br>

{log_mnesia_down,'gcfc-<u></u>triple1_519129@localhost'}}<br>

<br>

% mnesia_recover receives log_mnesia_down, store it in decision table and reply<br>

(<0.88.0>) << {'$gen_call',{<0.93.0>,#Ref<0.<u></u>0.0.1860>},<br>

{log_mnesia_down,'gcfc-<u></u>triple1_519129@localhost'}}<br>

(<0.88.0>) <0.93.0> ! {#Ref<0.0.0.1860>,ok}<br>

<br>

% mnesia_controller receives response from mnesia_recover:log_mnesia_<u></u>down()<br>

(<0.93.0>) << {#Ref<0.0.0.1860>,ok}<br>

<br>

% mnesia_controller calls mnesia_monitor:mnesia_down(<u></u>mnesia_controller, Node)<br>

(<0.93.0>) <0.85.0> ! {'$gen_cast', {mnesia_down,mnesia_<u></u>controller,<br>

'gcfc-triple1_519129@localhost<u></u>'}}<br>

<br>

% mnesia_controller receives mnesia_down, mnesia_controller<br>

(<0.85.0>) << {'$gen_cast', {mnesia_down,mnesia_<u></u>controller,<br>

'gcfc-triple1_519129@localhost<u></u>'}}<br>

<br>

% mnesia_controller forwards mnesia_down to mnesia_tm<br>

(<0.85.0>) <0.89.0> ! {mnesia_down,'gcfc-triple1_<u></u>519129@localhost'}<br>

<br>

% mnesia_tm receives mnesia_down<br>

(<0.89.0>) << {mnesia_down,'gcfc-triple1_<u></u>519129@localhost'}<br>

<br>

% mnesia_tm calls mnesia_monitor:mnesia_down(<u></u>mnesia_tm, {Node,<br>

ParticipantsTids}),<br>

(<0.89.0>) <0.85.0> ! {'$gen_cast', {mnesia_down,mnesia_tm,<br>

{'gcfc-triple1_519129@<u></u>localhost',[]}}}<br>

<br>

% mnesia_monitor receives mnesia_down, mnesia_tm<br>

% now it is going to call mnesia_locker:mnesia_down, which will send<br>

release_remote_non_pending<br>

(<0.85.0>) << {'$gen_cast', {mnesia_down, mnesia_tm,<br>

{'gcfc-triple1_519129@<u></u>localhost',[]}}}<br>

<br>

% Our processes receives information about node down and try to<br>

recover from mnesiai<br>

(<0.206.0>) << {activate,true}<br>

<br>

% Starts outer transaction<br>

(<0.206.0>) <0.89.0> ! {{<0.206.0>,#Ref<0.0.0.1875>},<u></u>start_outer}<br>

(<0.89.0>) << {{<0.206.0>,#Ref<0.0.0.1875>},<u></u>start_outer}<br>

<br>

% Reply for start_outer<br>

(<0.89.0>) <0.206.0> ! {mnesia_tm,#Ref<0.0.0.1875>,<br>

{new_tid,{tid,176,<0.206.0>},<u></u>245835}}<br>

(<0.206.0>) << {mnesia_tm,#Ref<0.0.0.1875>,<br>

{new_tid,{tid,176,<0.206.0>},<u></u>245835}}<br>

<br>

% Try to obtain lock for mnesia:select (just failed node holded sticky<br>

lock for that table)<br>

(<0.206.0>) {mnesia_locker,'gcfc-triple2_<u></u>519129@localhost'} !<br>

{<0.206.0>, {read, {tid,176, <0.206.0>},<br>

{'tasks_tt_gcfc-triple1_<u></u>519129@localhost_recover',<br>

'______WHOLETABLE_____'}}}<br>

(<0.87.0>) << {<0.206.0>, {read, {tid,176,<0.206.0>},<br>

{'tasks_tt_gcfc-triple1_<u></u>519129@localhost_recover',<br>

'______WHOLETABLE_____'}}}<br>

<br>

% mnesia_locker informs out process that it needs switch sticky lock<br>

(<0.87.0>) <0.206.0> ! {mnesia_locker,'gcfc-triple2_<u></u>519129@localhost',<br>

{switch,'gcfc-triple1_519129@<u></u>localhost', {<0.206.0>, {read,<br>

{tid,176,<0.206.0>},<br>

{'tasks_tt_gcfc-triple1_<u></u>519129@localhost_recover',<br>

'______WHOLETABLE_____'}}}}}<br>

(<0.206.0>) << {mnesia_locker,'gcfc-triple2_<u></u>519129@localhost',<br>

{switch,'gcfc-triple1_519129@<u></u>localhost', {<0.206.0>, {read,<br>

{tid,176,<0.206.0>},<br>

{'tasks_tt_gcfc-triple1_<u></u>519129@localhost_recover',<br>

'______WHOLETABLE_____'}}}}}<br>

<br>

% Our proces just want to send {<0.206.0>, {read, {tid,176,<0.206.0>},<br>

{'tasks_tt_gcfc-triple1_<u></u>519129@localhost_recover',<br>

'______WHOLETABLE_____'}}} to other node<br>

% Other node is not connected, so Erlang tryies to connect to it<br>

(<0.21.0> is net_kernel process)<br>

(<0.206.0>) <0.21.0> ! {'$gen_call', {<0.206.0>,#Ref<0.0.0.1884>},<br>

{connect,normal,'gcfc-triple1_<u></u>519129@localhost'}}<br>

<br>

% Connection to remote node was unsucsefull, and our process is hanged<br>

in this function:<br>

%<br>

% l_req_rec(Node, Store) -><br>

%     ?ets_insert(Store, {nodes, Node}),<br>

%     receive<br>

%         {?MODULE, Node, Reply} -><br>

%         Reply;<br>

%     {mnesia_down, Node} -><br>

%         {not_granted, {node_not_running, Node}}<br>

% end.<br>

%<br>

% It will never get out, as sent message is lost<br>

(<0.206.0>) << {#Ref<0.0.0.1884>,false}<br>

<br>

% mnesia_monitor call mnesia_locker:mnesia_down, which is sending<br>

release_remote_non_pending<br>

(<0.85.0>) <0.87.0> ! {release_remote_non_pending,<br>

'gcfc-triple1_519129@localhost<u></u>',[]}<br>

<br>

% mnesia_locker receives release_remote_non_pending<br>

% it removes sticky locks of failed nodes and releases all transactions<br>

% but it is too late - process <0.206.0> is already stucked for<br>

waiting for response from not connected node<br>

(<0.87.0>) << {release_remote_non_pending,'<u></u>gcfc-triple1_519129@localhost'<u></u>,[]}<br>

<br>

% mnesia_locker call mnesia_monitor:mnesia_down(<u></u>mnesia_locker, Node)<br>

(<0.87.0>) <0.85.0> ! {'$gen_cast', {mnesia_down,mnesia_locker,<br>

'gcfc-triple1_519129@localhost<u></u>'}}<br>

<br>

% mnesia_monitor receives mnesia_down, mnesia_locker<br>

(<0.85.0>) << {'$gen_cast',<br>

{mnesia_down,mnesia_locker,'<u></u>gcfc-triple1_519129@localhost'<u></u>}}<br>

<br>

% mnesia_monitor calls mnesia_lib:report_system_<u></u>event, which will send<br>

notify to mnesia_event<br>

(<0.85.0>) <0.83.0> ! {notify, {mnesia_system_event,<br>

{mnesia_down,'gcfc-triple1_<u></u>519129@localhost'}}}<br>

<br>

<br>

So problem is that transaction process sends messages to not-connected<br>

node and awaits response which will never come.<br>

One of the easiest solution is to monitor remote process, which<br>

prevent this from happening.<br>

<br>

Suggested patch is available at:<br>

<br>

git fetch git://<a href="http://github.com/janchochol/otp.git" target="_blank">github.com/janchochol/<u></u>otp.git</a> sticky_lock_race<br>

<br>

<a href="https://github.com/janchochol/otp/compare/erlang:maint...sticky_lock_race" target="_blank">https://github.com/janchochol/<u></u>otp/compare/erlang:maint...<u></u>sticky_lock_race</a><br>

<a href="https://github.com/janchochol/otp/compare/erlang:maint...sticky_lock_race.patch" target="_blank">https://github.com/janchochol/<u></u>otp/compare/erlang:maint...<u></u>sticky_lock_race.patch</a><br>

<br>

<br>

You can try to reproduce the problem with following module:<br>

<br>

--- BEGIN OF slock.erl ---<br>

%% stick_lock problem reproducer<br>

%%<br>

%% To reproduce problem execute these commands in parallel:<br>

%%<br>

%% erl -sname n2@localhost -cookie slock -s slock n2<br>

%% erl -sname n1@localhost -cookie slock -s slock n1<br>

%%<br>

%% Problem is timing related, so it has relatively low probability of<br>

reproducing.<br>

%% To increase probability use this patch:<br>

%%<br>

%% --- a/lib/mnesia/src/mnesia_<u></u>locker.erl<br>

%% +++ b/lib/mnesia/src/mnesia_<u></u>locker.erl<br>

%% @@ -663,6 +663,7 @@ mnesia_down(N, Pending) -><br>

%%             %% race with mnesia_tm's coordinator processes<br>

%%             %% that may restart and acquire new locks.<br>

%%             %% mnesia_monitor ensures the sync.<br>

%% +           timer:sleep(1000),<br>

%%             Pid ! {release_remote_non_pending, N, Pending}<br>

%%      end.<br>

%%<br>

%%<br>

%% When problem appears n1 node will hang indefinitely.<br>

%% When problem is not present n1 will finish shortly after n2.<br>

<br>

<br>

-module(slock).<br>

<br>

-export([n1/0, n2/0]).<br>

<br>

wait_for_n2() -><br>

     case rpc:call(n2@localhost, erlang, whereis, [n2]) of<br>

         Pid when is_pid(Pid) -><br>

             Pid;<br>

         _ -><br>

             timer:sleep(100),<br>

             wait_for_n2()<br>

     end.<br>

<br>

wait_fun(Fun, PollIntervalMs) -><br>

     case Fun() of<br>

         false -><br>

             timer:sleep(PollIntervalMs),<br>

             wait_fun(Fun, PollIntervalMs);<br>

         true -><br>

             ok<br>

     end.<br>

<br>

<br>

n1() -><br>

     register(n1, self()),<br>

     io:format("n1: statred~n", []),<br>

     io:format("n1: waiting for n2~n", []),<br>

     N2 = wait_for_n2(),<br>

     io:format("n1: n2 found~n", []),<br>

     io:format("n1: init mnesia start~n", []),<br>

     ok = application:start(mnesia, permanent),<br>

     ok = rpc:call(n2@localhost, application, start, [mnesia, permanent]),<br>

     {ok, [_ | _]} = mnesia:change_config(extra_db_<u></u>nodes, [n2@localhost]),<br>

     {atomic, ok} = mnesia:create_table(t1, [{ram_copies,<br>

[n1@localhost, n2@localhost]}]),<br>

     ok = rpc:call(n2@localhost, mnesia, wait_for_tables, [[t1], infinity]),<br>

     io:format("n1: init mnesia end~n", []),<br>

     N2 ! {ping, self()},<br>

     receive step -> ok end,<br>

     wait_fun(fun() -> net_adm:ping(n2@localhost) =:= pang end, 100),<br>

     io:format("n1: select start~n", []),<br>

     Ret = mnesia:transaction(fun() -><br>

                 wait_fun(fun() -> net_adm:ping(n2@localhost) =:= pang end, 10),<br>

                 mnesia:select(t1, [{{t1, '_', '_'}, [], ['$_']}])<br>

         end),<br>

     io:format("n1: select end: ~p~n", [Ret]),<br>

     init:stop().<br>

<br>

n2() -><br>

     register(n2, self()),<br>

     io:format("n2: statred~n", []),<br>

     io:format("n2: waiting for n1 ping~n", []),<br>

     N2 = receive<br>

         {ping, IN2} -><br>

             IN2<br>

     end,<br>

     io:format("n2: n1 pinged~n", []),<br>

     io:format("n2: write start~n", []),<br>

     mnesia:transaction(fun() -> mnesia:write(t1, {t1, a, b}, sticky_write) end),<br>

     io:format("n2: write end~n", []),<br>

     N2 ! step,<br>

     io:format("n2: finish~n", []),<br>

     erlang:halt().<br>

--- END OF slock.erl ---<br>

<br>

I do not see any other easy solution for this problem.<br>

<br>

Best Regards,<br>

Jan Chochol<br>

______________________________<u></u>_________________<br>

erlang-patches mailing list<br>

<a href="mailto:erlang-patches@erlang.org" target="_blank">erlang-patches@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-patches" target="_blank">http://erlang.org/mailman/<u></u>listinfo/erlang-patches</a><br>

</blockquote>

<br></div></div><span class="HOEnZb"><font color="#888888">

-- <br>

/Henrik Nord Erlang/OTP</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

______________________________<u></u>_________________<br>

erlang-patches mailing list<br>

<a href="mailto:erlang-patches@erlang.org" target="_blank">erlang-patches@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-patches" target="_blank">http://erlang.org/mailman/<u></u>listinfo/erlang-patches</a><br>

</div></div></blockquote></div><br></div>