[erlang-questions] gproc : 2 nodes out of sync behaviour.
Morgan Segalis
msegalis@REDACTED
Sun Jul 1 17:01:46 CEST 2012
Gproc may make life more interesting, but right now, I certainly know that gproc made my life easier, thanks to you :-)
(Sorry for my late answer, but I wanted to think about the solution before posting it)
When you say : " since gen_leader didn't use to have a way to handle netsplits."
1- this means that gen_leader handles netsplits now ?
2- If so, gproc_dist would only need a way to know when a netsplits happened, right ?
What about this solution ? (if previous 1 & 2 are true)
=============================== nodemonitor.erl ===================================
-module(nodemonitor).
-behaviour(gen_server).
-record(nodemonitor, {nodes=none}).
-export([start_link/0, init/1, handle_call/3, handle_cast/2, terminate/2, code_change/3, handle_info/2]).
start_link() ->
gen_server:start_link(?MODULE, [], []).
init([]) ->
net_kernel:monitor_nodes(true, [{node_type, visible}]),
{ok, #nodemonitor{nodes=dict:new()}}.
handle_call(_, _, NM) ->
{noreply, NM}.
handle_cast(_, NM) ->
{noreply, NM}.
handle_info({nodeup, Node, Params}, NM) ->
case dict:find(Node, NM#nodemonitor.nodes) of
{ok, disconnected} ->
%% Let know gproc_dist about this netsplit
io:fwrite("netsplit occured: ~p~n", [Node]);
{ok, connected} ->
io:fwrite("Error occured: ~p~n", [Node]);
error ->
io:fwrite("New Node detected: ~p~n", [Node])
end,
Dict = dict:store(Node, connected, NM#nodemonitor.nodes),
{noreply, NM#nodemonitor{nodes=Dict}};
handle_info({nodedown, Node, Params}, NM) ->
Dict = dict:store(Node, disconnected, NM#nodemonitor.nodes),
{noreply, NM#nodemonitor{nodes=Dict}}.
code_change(_, NM, _) ->
{noreply, NM}.
terminate(normal, _) ->
ok.
-----------------------------------------------------------------------------------------------------------------------------------------
I'm surely not pretending that this solution would not have been thought by you, so there is something I don't get.
Do you think it would be possible to do something about it ?
3 - if 1 & 2 are not true then would it be possible, in your opinion, to stop & start gproc and re-register every value so every cluster are in sync again ?
Le 1 juil. 2012 à 13:49, Ulf Wiger a écrit :
> It's a feature of gproc, carefully crafted to make life more interesting. ;-)
>
> There is no resynch after netsplit in gproc, since gen_leader didn't use to have a way to handle netsplits. Still, there is no hook to inform the callback (gproc_dist) about what's happened.
>
> One way to deal with this, is to set -kernel dist_auto_connect false, and add a "backdoor ping" (e.g. over UDP). If you get a ping from a known node that's not in the nodes() list, you have a netsplit situation. You can then select which node(s) to restart. After restart, normal synch will ensue, and since the nodes never auto-connected, you will have concistency (but quite possibly data loss, of course).
>
> BR,
> Ulf W
>
> Ulf Wiger, Feuerlabs, Inc.
> http://www.feuerlabs.com
>
> 1 jul 2012 kl. 13:36 skrev Morgan Segalis <msegalis@REDACTED>:
>
>> Hello everyone,
>>
>> I have 2 nodes which use gproc.
>> Both are well connected to each other…
>> But sometimes (doesn't happen really often, but it does) both server gets disconnected to each other, once their are connected again, gproc is out of sync.
>>
>> Here's what happen :
>> 1- A is connected to B.
>> 2- a new value X set by A is saw by B
>> 3- a new value Y set by B is saw by A
>> -------- they get disconnect for a second or two --------
>> 4- Clusters lost connection
>> -------- they reconnect ----------
>> 5- Clusters regain connection
>> 6- the old value X set by A is not saw anymore by B
>> 7- the old value Y set by B is not saw anymore by B
>> 8- a new value Z set by A is saw by B
>> 9- a new value V set by B is not saw by A
>>
>> how come in "8" the new value Z set by A is saw by B and in "9" a new value V set by B is not saw by A ?
>> I know that there is a leader, which is probably B, but I can't explain why new value are not seen symmetrically.
>> what should I do for reconnecting correctly both cluster, so old value and new value are saw in both cluster again ?
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list