[erlang-questions] Several gen_leader questions
Vasily Sulatskov
vasily@REDACTED
Tue Dec 18 20:32:13 CET 2012
Hi,
I am using gen_leader from https://github.com/abecciu/gen_leader_revivaland I
want to use it with a fixed list of candidate nodes - as far as I
understand
that's the easiest way.
I tried several variations of starting gen_leader but none was
satisfactory.
I start it as part of a supervision tree, so all examples are taken from
gen_supervisor modules init/1 functions:
Here's what I tried so far:
init(_Args) ->
%% The same value is used on all machines in the cluster
Leader_nodes = ['foobar@REDACTED', 'foobar@REDACTED', 'foobar@REDACTED'],
Home = os:getenv("HOME"),
Gen_leader_config = {gen_leader_module, {gen_leader_module, start_link,
[Leader_nodes,
[{vardir, Home}]]},
permanent, 2000, worker, [gen_leader_module]},
{ok, {{one_for_one, 10000, 1},
[Gen_leader_config]}}.
If I do this, then nodes specified in Leader_nodes work just fine, they all
participate in elections, leaders are elected properly, they are able to do
gen_leader:leader_call() to the actual leader etc.
The problem is that on all other nodes (which are not specified in
Leader_nodes) gen_leader is not started at all. Gen_leader checks if the
node
it's running on is one of "candidate nodes" or "worker nodes" and if that's
not
the case - it simply doesn't start. All further attempts at
gen_leader:leader_call from that node fail.
I tried to run every node in the cluster except for "candidate nodes" as a
"worker node", so I changed supervisor to something like:
init(_Args) ->
%% The same value is used on all machines in the cluster
Leader_nodes = ['foobar@REDACTED', 'foobar@REDACTED', 'foobar@REDACTED'],
Workers =
case lists:member(node(), Leader_nodes) of
true ->
[];
false ->
[node()]
end,
Home = os:getenv("HOME"),
{ok, {{one_for_one, 10000, 1},
[{scheduler, {scheduler, start_link,
[Leader_nodes,
[{vardir, Home},
{workers, Workers}]]},
permanent, 2000, worker, [scheduler_leader]}]}}.
As far as I understand, when gen_leader runs in a worker configuration, it
doesn't participate in elections, but still keeps track of where an actual
leader is running, so gen_leader:leader_call is still possible.
This setup kind of works, but it seems that gen_leader process on "worker"
nodes
constantly grows in memory usage, past several Gb at least, eventually
crashing
the whole VM.
Am I running gen_leader correctly?
What is the correct way of running gen_leader with a fixed set of
"candidate"
nodes and that every other node is aware of where a leader is running, so
that
gen_leader:leader_call() is possible?
Which version of gen_leader is recommended to use? This one
https://github.com/abecciu/gen_leader_revival? Or maybe the version from
gproc? By the way can someone explain what's the difference between them?
And I have another, most likely unrelated, issue with gen_leader. On one
deployment, sometimes I find a cluster in a state with two leaders - most
of
the nodes think that the leader is one node, but some other node thinks
that the leader is on the other node. I am not sure if the other leader is
the
node that diverges from the consensus - I don't have a cluster in this
state
right now to check.
It seems to happen after a gen_leader process crashes somehow (some
internal
work, not related to gen_leader magic).
The other thing that I think might be important here, is that gen_leader
process in that setup can get stuck in handle_leader_call for quite some
long
time. Can it cause problems with leader elections? Should gen_leader
processes
not block in handle_whatever functions and always be able to handle
election
callback?
Thanks in advance.
--
Best regards,
Vasily Sulatskov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121218/cd415ec5/attachment.htm>
More information about the erlang-questions
mailing list