Redundancy, gen_leader and network splits

Tim Bates tim@REDACTED
Fri Jun 3 14:23:42 CEST 2005

Hi folks,
I want to set up an Erlang system with multiple nodes for redundancy. 
The system will be used to drive an external Web Services interface, so 
I only want one node at a time to be attempting to to communicate with 
the Web Service.

gen_leader appears to be what I want to use, so that one node at a time 
is the "leader" and talks to the Web Service and the others sit around 
replicating the database doing mostly nothing until the leader has a 
power failure or network outage and one of the running nodes takes over. 
I'm not quite sure on the details of how I'd use gen_leader to achieve 
this, but I'm working on it...

Does the version of gen_leader in jungerl fix the bug that was discussed 
on this mailing list a while ago in the election process? I found a 
leader_new.erl file on the internet which purports to fix this but it 
appears to be just a sample implementation of the algorithm and doesn't 
implement all the hooks etc that gen_leader.erl does.

Finally, I want to be absolutely sure that there won't be two nodes both 
thinking they're the leader at the same time, which I don't think 
gen_leader ensures. For example: I have three nodes A, B and C, and A is 
the leader. A's connection to B and C goes down but not its connection 
to the Web Service. B and C will elect a new leader between them and A 
will still think it is the leader. Both will try and interact with the 
Web Service, which is what's known as a Bad Thing.

The way I could see this working is that a node will only become the 
leader if it has a connection to more than 50% of the other candidates. 
This way, in the example above A will relinquish its leader status 
because it can't reach B or C, and B and C will elect a new leader and 
everything works. This scales to an arbitrary number of nodes; in any 
network split, at most one pool of connected nodes will remain connected 
to 50% of the candidates. Of course if the network split results in 
three pools of unconnected nodes where none can see more than 50% of the 
network then there will be no leader, but for my application this is 
much better than having two leaders.

How difficult would it be to add this behaviour to gen_leader?


Tim Bates

More information about the erlang-questions mailing list