Election vs consensus

Sat Jan 8 07:34:23 CET 2022

I think Aliaksei put it well. The simpler election algorithms like bully
and ring election are not partition-tolerant, they can elect multiple
leaders at the same time if the network is bad. Raft uses a fancier
algorithm based on majority vote which solves that weakness (and more).

Dominic

2022年1月8日(土) 15:09 Aliaksei Artamonau <aliaksiej.artamonau@REDACTED>:

> When you talk about implementing consensus on top of leader election, how
> exactly do you envision it? Getting a bit more details could help answer
> your question.
>
> You could use the bully algorithm to implement leader election for a
> consensus protocol like (Multi-)Paxos. Raft is slightly different because
> it uses leader election to avoid the need for the new leader to synchronize
> with the other nodes to get all updates that may have been committed by
> previous leaders. At the same time, the leader election algorithm that Raft
> uses is not that different from the bully algorithm: substitute process IDs
> for log positions and require at least a majority of responses.
>
> Some reasons for why majority voting schemes are typically used to
> implement consensus protocols (to my understanding).
>
> 1. In order for the consensus protocol to make progress a majority of
> nodes is required anyway. So it does not achieve much to choose a leader
> when there's no majority.
> 2. In Raft specifically, leaders use integer term/epoch numbers. And no
> term/epoch should have more than one leader, which requires for at least a
> majority of nodes to agree on a leader.
> 3. Competing leaders impede progress. So checking with a majority of nodes
> lowers the probability of having concurrent leaders.
>
> Roughly speaking what a consensus protocol like Raft gives you:
>
> 1. A total order on leader terms.
> 2. A way to ensure that a stale leader can't commit anything once a new
> term has commenced.
> 3. A way for a new leader to be sure that it has got all updates that may
> have previously been committed.
> 4. A total order on all updates.
> 5. A way to change the set of nodes in your cluster while preserving
> safety.
> 5. Availability as long as a majority of nodes are alive.
>
> Here's a paper that I found useful
> https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/deconstr_paxos.pdf.
>
>
> Aliaksei
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20220108/04478323/attachment.htm>