Redundancy, gen_leader and network splits
Fri Jun 3 14:23:42 CEST 2005
I want to set up an Erlang system with multiple nodes for redundancy.
The system will be used to drive an external Web Services interface, so
I only want one node at a time to be attempting to to communicate with
the Web Service.
gen_leader appears to be what I want to use, so that one node at a time
is the "leader" and talks to the Web Service and the others sit around
replicating the database doing mostly nothing until the leader has a
power failure or network outage and one of the running nodes takes over.
I'm not quite sure on the details of how I'd use gen_leader to achieve
this, but I'm working on it...
Does the version of gen_leader in jungerl fix the bug that was discussed
on this mailing list a while ago in the election process? I found a
leader_new.erl file on the internet which purports to fix this but it
appears to be just a sample implementation of the algorithm and doesn't
implement all the hooks etc that gen_leader.erl does.
Finally, I want to be absolutely sure that there won't be two nodes both
thinking they're the leader at the same time, which I don't think
gen_leader ensures. For example: I have three nodes A, B and C, and A is
the leader. A's connection to B and C goes down but not its connection
to the Web Service. B and C will elect a new leader between them and A
will still think it is the leader. Both will try and interact with the
Web Service, which is what's known as a Bad Thing.
The way I could see this working is that a node will only become the
leader if it has a connection to more than 50% of the other candidates.
This way, in the example above A will relinquish its leader status
because it can't reach B or C, and B and C will elect a new leader and
everything works. This scales to an arbitrary number of nodes; in any
network split, at most one pool of connected nodes will remain connected
to 50% of the candidates. Of course if the network split results in
three pools of unconnected nodes where none can see more than 50% of the
network then there will be no leader, but for my application this is
much better than having two leaders.
How difficult would it be to add this behaviour to gen_leader?
More information about the erlang-questions