[erlang-questions] Mnesia does not detect netsplit

Joseph Norton norton@REDACTED
Thu Sep 29 11:03:52 CEST 2011


FYI.  I posted a suggestion on the mailing list for a network partition detector application. 

http://erlang.org/pipermail/erlang-questions/2011-August/060702.html

If you have any questions, please send to me off list.

thanks,

Joseph Norton
norton@REDACTED



On Sep 29, 2011, at 5:55 PM, Jonas Boberg wrote:

> Hi,
> 
> We found a case where mnesia does not detect a netsplit.
> 
> Let's say we are running two mnesia nodes, A and B:
> At startup, node A can't connect to node B (specified in the mnesia
> config parameter extra_db_nodes). In this case node B is actually
> running, but because of a temporary network issue, or node B being
> heavily loaded, net_kernel:connect fails. When node A and B eventually
> are connected (for example due to a non-mnesia process sending a
> message between the nodes), mnesia does not detect the split, and the
> two isles continue to run separately.
> 
> Note that when we say that mnesia does not detect the netsplit, we
> mean that mnesia does not generate any 'inconsistent_database' event.
> 
> How to reproduce.
> * In this example we simulate a network problem (net_kernel:connect
> failure) by having the two nodes use different cookies.
> ------------------
> $ erl -name test1@REDACTED -mnesia schema_location ram -mnesia
> extra_db_nodes "['test2@REDACTED']" -setcookie a
> (test1@REDACTED)1> application:start(mnesia),
> mnesia:subscribe(system), mnesia:create_table(my_table, []).
> $ erl -name test2@REDACTED -mnesia schema_location ram -mnesia
> extra_db_nodes "['test1@REDACTED']" -setcookie b
> (test2@REDACTED)1> application:start(mnesia),
> mnesia:subscribe(system), mnesia:create_table(my_other_table, []).
> %% Connect nodes
> (test1@REDACTED)2> erlang:set_cookie(node(), b),
> net_kernel:connect('test2@REDACTED').
> (test1@REDACTED)3> nodes().
> ['test2@REDACTED']
> (test1@REDACTED)4> mnesia:info().
> ...
> running db nodes   = ['test1@REDACTED']
> stopped db nodes   = ['test2@REDACTED']
> ...
> 
> ------------------
> Expected behaviour: subscriber gets a 'inconsistent_database' event
> Actual behaviour: subscriber does not get any event.
> 
> Compare to this case, where mnesia correctly detects a inconsistent database:
> ------------------
> $ erl -name test1@REDACTED -mnesia schema_location ram -mnesia
> extra_db_nodes "['test2@REDACTED']" -setcookie a
> (test1@REDACTED)1> application:start(mnesia),
> mnesia:subscribe(system), mnesia:create_table(my_table, []).
> $ erl -name test2@REDACTED -@REDACTED -mnesia schema_location ram
> -mnesia extra_db_nodes "['test1@REDACTED']" -setcookie a
> (test2@REDACTED)1> application:start(mnesia),
> mnesia:subscribe(system), mnesia:create_table(my_other_table, []).
> (test2@REDACTED)2> net_kernel:disconnect('test1@REDACTED').
> (test2@REDACTED)3> net_kernel:connect('test1@REDACTED').
> (test2@REDACTED)4> flush().
> Shell got {mnesia_system_event,{mnesia_down,'test1@REDACTED'}}
> Shell got {mnesia_system_event,
>             {inconsistent_database,running_partitioned_network,
>                 'test1@REDACTED'}}
> 
> We found that the mnesia code that detects netsplits is in
> mnesia_monitor. It uses net_kernel:monitor_nodes(true), to monitor
> nodes going up and down. In the problematic scenario, when the
> mnesia_monitor gets the the 'nodeup', it seems to ignore it since a
> node down has not been seen.
> Trace:
> (<0.53.0>) call
> mnesia_monitor:handle_info({nodeup,'test1@REDACTED'},{state,<0.52.0>,[],[],true,[],undefined,[]})
> (<0.53.0>) call mnesia_recover:has_mnesia_down('test1@REDACTED')
> (<0.53.0>) returned from mnesia_recover:has_mnesia_down/1 -> false
> 
> Does anyone have an idea about how we could work around this issue? If
> we would detect the split ourselves, is there anyway we could get
> mnesia to reconnect the nodes?
> 
> Regards
> Jonas
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list