[erlang-questions] Several gen_leader questions

Vasily Sulatskov redvasily@REDACTED
Thu Dec 20 13:56:03 CET 2012


Bumping the thread in hope that someone will answer.


On Tuesday, December 18, 2012 8:32:13 PM UTC+1, Vasily Sulatskov wrote:
>
> Hi, 
>
> I am using gen_leader from https://github.com/abecciu/gen_leader_revivaland I 
> want to use it with a fixed list of candidate nodes - as far as I 
> understand 
> that's the easiest way. 
>
> I tried several variations of starting gen_leader but none was 
> satisfactory. 
>
> I start it as part of a supervision tree, so all examples are taken from 
> gen_supervisor modules init/1 functions: 
>
> Here's what I tried so far: 
>
> init(_Args) -> 
>     %% The same value is used on all machines in the cluster 
>     Leader_nodes = ['foobar@REDACTED', 'foobar@REDACTED', 'foobar@REDACTED'], 
>
>     Home = os:getenv("HOME"), 
>
>     Gen_leader_config = {gen_leader_module, {gen_leader_module, 
> start_link, 
>                                [Leader_nodes, 
>                                 [{vardir, Home}]]}, 
>                   permanent, 2000, worker, [gen_leader_module]}, 
>
>     {ok, {{one_for_one, 10000, 1}, 
>           [Gen_leader_config]}}. 
>
> If I do this, then nodes specified in Leader_nodes work just fine, they 
> all 
> participate in elections, leaders are elected properly, they are able to 
> do 
> gen_leader:leader_call() to the actual leader etc. 
>
> The problem is that on all other nodes (which are not specified in 
> Leader_nodes) gen_leader is not started at all. Gen_leader checks if the 
> node 
> it's running on is one of "candidate nodes" or "worker nodes" and if 
> that's not 
> the case - it simply doesn't start. All further attempts at 
> gen_leader:leader_call from that node fail. 
>
> I tried to run every node in the cluster except for "candidate nodes" as a 
> "worker node", so I changed supervisor to something like: 
>
> init(_Args) -> 
>     %% The same value is used on all machines in the cluster 
>     Leader_nodes = ['foobar@REDACTED', 'foobar@REDACTED', 'foobar@REDACTED'], 
>
>     Workers = 
>         case lists:member(node(), Leader_nodes) of 
>             true -> 
>                 []; 
>             false -> 
>                 [node()] 
>         end, 
>
>     Home = os:getenv("HOME"), 
>
>     {ok, {{one_for_one, 10000, 1}, 
>           [{scheduler, {scheduler, start_link, 
>                                [Leader_nodes, 
>                                 [{vardir, Home}, 
>                                  {workers, Workers}]]}, 
>             permanent, 2000, worker, [scheduler_leader]}]}}. 
>
> As far as I understand, when gen_leader runs in a worker configuration, it 
> doesn't participate in elections, but still keeps track of where an actual 
> leader is running, so gen_leader:leader_call is still possible. 
>
> This setup kind of works, but it seems that gen_leader process on "worker" 
> nodes 
> constantly grows in memory usage, past several Gb at least, eventually 
> crashing 
> the whole VM. 
>
> Am I running gen_leader correctly? 
>
> What is the correct way of running gen_leader with a fixed set of 
> "candidate" 
> nodes and that every other node is aware of where a leader is running, so 
> that 
> gen_leader:leader_call() is possible? 
>
> Which version of gen_leader is recommended to use? This one 
> https://github.com/abecciu/gen_leader_revival? Or maybe the version from 
> gproc? By the way can someone explain what's the difference between them? 
>
>
> And I have another, most likely unrelated, issue with gen_leader. On one 
> deployment, sometimes I find a cluster in a state with two leaders - most 
> of 
> the nodes think that the leader is one node, but some other node thinks 
> that the leader is on the other node. I am not sure if the other leader is 
> the 
> node that diverges from the consensus - I don't have a cluster in this 
> state 
> right now to check. 
>
> It seems to happen after a gen_leader process crashes somehow (some 
> internal 
> work, not related to gen_leader magic). 
>
> The other thing that I think might be important here, is that gen_leader 
> process in that setup can get stuck in handle_leader_call for quite some 
> long 
> time. Can it cause problems with leader elections? Should gen_leader 
> processes 
> not block in handle_whatever functions and always be able to handle 
> election 
> callback? 
>
>
> Thanks in advance. 
>
> -- 
> Best regards,
> Vasily Sulatskov
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121220/63a43425/attachment.htm>


More information about the erlang-questions mailing list