[erlang-questions] Several gen_leader questions
Vasily Sulatskov
redvasily@REDACTED
Thu Dec 20 13:56:03 CET 2012
Bumping the thread in hope that someone will answer.
On Tuesday, December 18, 2012 8:32:13 PM UTC+1, Vasily Sulatskov wrote:
>
> Hi,
>
> I am using gen_leader from https://github.com/abecciu/gen_leader_revivaland I
> want to use it with a fixed list of candidate nodes - as far as I
> understand
> that's the easiest way.
>
> I tried several variations of starting gen_leader but none was
> satisfactory.
>
> I start it as part of a supervision tree, so all examples are taken from
> gen_supervisor modules init/1 functions:
>
> Here's what I tried so far:
>
> init(_Args) ->
> %% The same value is used on all machines in the cluster
> Leader_nodes = ['foobar@REDACTED', 'foobar@REDACTED', 'foobar@REDACTED'],
>
> Home = os:getenv("HOME"),
>
> Gen_leader_config = {gen_leader_module, {gen_leader_module,
> start_link,
> [Leader_nodes,
> [{vardir, Home}]]},
> permanent, 2000, worker, [gen_leader_module]},
>
> {ok, {{one_for_one, 10000, 1},
> [Gen_leader_config]}}.
>
> If I do this, then nodes specified in Leader_nodes work just fine, they
> all
> participate in elections, leaders are elected properly, they are able to
> do
> gen_leader:leader_call() to the actual leader etc.
>
> The problem is that on all other nodes (which are not specified in
> Leader_nodes) gen_leader is not started at all. Gen_leader checks if the
> node
> it's running on is one of "candidate nodes" or "worker nodes" and if
> that's not
> the case - it simply doesn't start. All further attempts at
> gen_leader:leader_call from that node fail.
>
> I tried to run every node in the cluster except for "candidate nodes" as a
> "worker node", so I changed supervisor to something like:
>
> init(_Args) ->
> %% The same value is used on all machines in the cluster
> Leader_nodes = ['foobar@REDACTED', 'foobar@REDACTED', 'foobar@REDACTED'],
>
> Workers =
> case lists:member(node(), Leader_nodes) of
> true ->
> [];
> false ->
> [node()]
> end,
>
> Home = os:getenv("HOME"),
>
> {ok, {{one_for_one, 10000, 1},
> [{scheduler, {scheduler, start_link,
> [Leader_nodes,
> [{vardir, Home},
> {workers, Workers}]]},
> permanent, 2000, worker, [scheduler_leader]}]}}.
>
> As far as I understand, when gen_leader runs in a worker configuration, it
> doesn't participate in elections, but still keeps track of where an actual
> leader is running, so gen_leader:leader_call is still possible.
>
> This setup kind of works, but it seems that gen_leader process on "worker"
> nodes
> constantly grows in memory usage, past several Gb at least, eventually
> crashing
> the whole VM.
>
> Am I running gen_leader correctly?
>
> What is the correct way of running gen_leader with a fixed set of
> "candidate"
> nodes and that every other node is aware of where a leader is running, so
> that
> gen_leader:leader_call() is possible?
>
> Which version of gen_leader is recommended to use? This one
> https://github.com/abecciu/gen_leader_revival? Or maybe the version from
> gproc? By the way can someone explain what's the difference between them?
>
>
> And I have another, most likely unrelated, issue with gen_leader. On one
> deployment, sometimes I find a cluster in a state with two leaders - most
> of
> the nodes think that the leader is one node, but some other node thinks
> that the leader is on the other node. I am not sure if the other leader is
> the
> node that diverges from the consensus - I don't have a cluster in this
> state
> right now to check.
>
> It seems to happen after a gen_leader process crashes somehow (some
> internal
> work, not related to gen_leader magic).
>
> The other thing that I think might be important here, is that gen_leader
> process in that setup can get stuck in handle_leader_call for quite some
> long
> time. Can it cause problems with leader elections? Should gen_leader
> processes
> not block in handle_whatever functions and always be able to handle
> election
> callback?
>
>
> Thanks in advance.
>
> --
> Best regards,
> Vasily Sulatskov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121220/63a43425/attachment.htm>
More information about the erlang-questions
mailing list