Raft - lenary/cs4414-project GitHub Wiki

This document should explain Raft, and the System Model it relies upon.

See also: Glossary (it's worth reading that before you read this, as certain words have very particular meanings).

Raft is formally defined in the paper at http://raftconsensus.github.io. This page should clarify some points raised in the paper, and condense the definitions for ease of reference.

What is Raft For?

Raft is a Distributed Consensus Protocol. In terms of the CAP Theorem, it tries to keep a system Consistent in the presence of Network Partitions (CP).

Other Consensus Protocols include Paxos, Multi-Paxos, and Zab.

What does Raft Do?

Consider a deterministic state machine. It has a set of states, a set of transitions, and a set of input commands that trigger transitions and return output results. Most computer progams can be expressed as a deterministic state machine (especially as memory is finite, unlike in a Turing Machine).

So, you have a program that is, in effect, a Deterministic State Machine. It is running on a server, and clients are interacting with it. This is simple, great. There's only one point of truth (that server), so you know the system will always be consistent.

However, now you run into a problem. Your single server has to go down for maintenance, but you want to be able to continue to serve client commands during this time.

Of course, the way to solve this is to have n copies of the program, running on n seperate servers (we'll call these Peers, using the given definition). Take away one peer, and the remaining n-1 peers can continue serving client commands.

But now you have a much larger problem. How do you coordinate your n seperate peers to make sure they always agree on the result of client commands, i.e. if a client was to send the same command to any peer, it should always get the same result back (as if there was really only a single server running the Deterministic State Machine).

Raft's approach is simple. Given the State Machine is deterministic, if you give seperate copies of it the exact same sequence of commands, then both copies will end in the exact same state. This means that Raft only has to make the peers agree on a sequence of commands, which it then provides to the State Machine.

However, how can it agree on the sequence of commands? Raft's approach is simple. Elect a Leader, and then only the leader can suggest new commands to be appended into the Log.

Leader Election is done by Cluster Majority, whereby most of the Cluster's peers must agree on the same Leader before it is successfully elected. If the leader loses contact with the peers that elected it, a new election is started to find a new leader.

There are a few other constraints that must be satisfied to do with appending commands to the log, because of details around network partitions and re-elections.

Importantly, only once a majority of the cluster has accepted commands suggested by the leader, can they be considered agreed upon, and committed to the State Machine. Before a majority have confirmed, there is the possibility of a network partition that could prevent enough nodes from finding out about the new commands, and thus accepting them.

What doesn't Raft Do?

Raft is only a consensus protocol. It makes no assumptions about the logic implemented on top of it.

While parallels are often drawn to Zookeeper and Chubby, this is incorrect. Zookeeper is a "Coordination Service" and Chubby is a "Lock Server" (according to their own papers). These both use a Consensus Algorithm underneath, but the Consensus Algorithm is mostly orthogonal to the service they provide to the programmer. Zookeeper uses a variant of multi-paxos called "Zab" as its consensus algorithm, and Chubby uses Paxos (which variant is left unspecified in its paper).

System Model

The system model, in brief:

n Peers, all running Raft and the deterministic state machine that is the implemented logic (this could be anything, it's not defined by Raft itself). Each Peer also has durable storage - i.e. it can recover the consistent log if it ever needs to restart.

Raft and the Deterministic State Machine cannot be seperated by a network connection, and nor can Raft and the durable storage it uses for its log - this is because network links can fail in many different ways, which Raft has not assumed that the interfaces to these parts will fail.
A way of having Clients submit commands to be executed on the Deterministic State Machine. Clients usually number far more than Peers, and have less knowlege about the cluster than the peers do.
An asynchronous network, by which all the Peers and Clients communicate.

Expectations

A Raft peer cannot tell when a message it has sent has arrived, except by recieving a confirmation message in return. There is no bound on how long messages take to arrive.
A Raft peer cannot tell the difference between a) the network failing, b) a peer failing, and c) a peer responding very slowly.
The network will fail.
Peers will fail, both temporarily (meaning they will return, and can help with consensus again afterwards) and permanently (meaning they won't, and writes that haven't made it from that peer to any others will be lost forever).