RAFT - rFronteddu/general_wiki GitHub Wiki

RAFT is a protocol for implementing distributed consensus.

A node can be in 1 of 3 states:

  • Follower:
  • Candidate:
  • Leader:

Leader election

All nodes start in the follower state. If followers don't hear from a leader they can become a candidate. The candidate then requests votes from other nodes. Nodes will reply with their vote. The candidate becomes the leader if it gets votes from a majority of nodes. All changes to the system now go through the leader.

In Raft there are two timeout settings which control elections.

  • Election timeout: Amount of time a follower waits until becoming a candidate (randomized between 150ms and 300ms). After the election timeout, the follower becomes a candidate and starts a new election term, votes for itself, and sends out Request Vote messages to other nodes. If a receiving node hasn't voted yet in this term then it votes for the candidate and the node resets its election timeout. Once a candidate has a majority of votes it becomes a leader. The leader begins sending out Append Entries messages to its followers.
  • The Heartbeat timeout is the time between Append Entries messages. Followers respond to each Append Entries message. This election term will continue until a follower stops receiving heartbeats and becomes a candidate. Requiring a majority of votes guarantees that only one leader can be elected per term.

Split Election

If two nodes become candidate at the same time they will wait for a new election and try again.

Log Replication

Once a leader is elected, all changes to the system go through the leader. Each change is added as an entry in the node's log. The log entry is currently uncommitted so it won't update the node's value. To commit the entry the node first replicates it to the follower nodes then the leader waits until a majority of nodes have written the entry. At that point the entry is committed on the leader. The leader then notifies the followers that the entry is committed. The cluster has now come to consensus about the system state.

Once a leader is elected, we need to replicate all changes to our system to all nodes. This is done through the Append Entries messages used for heartbeats. The process works as follows:

  • First a client sends a change to the leader.
  • The change is appended to the leader's log then the change is sent to the followers on the next heartbeat.
  • An entry is committed once a majority of followers acknowledge it and a response is sent to the client.