Election vs consensus

Aliaksei Artamonau aliaksiej.artamonau@REDACTED
Sat Jan 8 07:09:24 CET 2022


When you talk about implementing consensus on top of leader election, how
exactly do you envision it? Getting a bit more details could help answer
your question.

You could use the bully algorithm to implement leader election for a
consensus protocol like (Multi-)Paxos. Raft is slightly different because
it uses leader election to avoid the need for the new leader to synchronize
with the other nodes to get all updates that may have been committed by
previous leaders. At the same time, the leader election algorithm that Raft
uses is not that different from the bully algorithm: substitute process IDs
for log positions and require at least a majority of responses.

Some reasons for why majority voting schemes are typically used to
implement consensus protocols (to my understanding).

1. In order for the consensus protocol to make progress a majority of nodes
is required anyway. So it does not achieve much to choose a leader when
there's no majority.
2. In Raft specifically, leaders use integer term/epoch numbers. And no
term/epoch should have more than one leader, which requires for at least a
majority of nodes to agree on a leader.
3. Competing leaders impede progress. So checking with a majority of nodes
lowers the probability of having concurrent leaders.

Roughly speaking what a consensus protocol like Raft gives you:

1. A total order on leader terms.
2. A way to ensure that a stale leader can't commit anything once a new
term has commenced.
3. A way for a new leader to be sure that it has got all updates that may
have previously been committed.
4. A total order on all updates.
5. A way to change the set of nodes in your cluster while preserving safety.
5. Availability as long as a majority of nodes are alive.

Here's a paper that I found useful
https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/deconstr_paxos.pdf.


Aliaksei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20220107/61e73e9b/attachment.htm>


More information about the erlang-questions mailing list