Distributed Consensus - bobbae/gcp GitHub Wiki

Distributed consensus problem requires agreement among a number of processes (or agents) for a single data value. Some of the processes may fail or be unreliable in other ways, so consensus protocols must be fault tolerant or resilient.

Serializability and Linearizability

http://www.bailis.org/blog/linearizability-versus-serializability/

Logical Clocks

A logical clock is an idea put forward by Lamport in 1987 to solve possible problems caused by clock inconsistency between different machines in a distributed system.

https://www.alibabacloud.com/blog/a-brief-analysis-of-consensus-protocol-from-logical-clock-to-raft_594675

State machine replication

Many distributed systems use state machine replication to synchronize data between copies, such as HDFS, Chubby, and Zookeeper.

Paxos, ZAB and Raft

Paxos is a consensus protocol algorithm developed by Lamport in the 1990s. ZAB (ZooKeeper Atomic BoardCast) is a consensus protocol used in ZooKeeper. Raft is a consensus protocol developed by developers at Stanford University in 2014. It was developed to be easier to understand than Paxos.

The improved Paxos protocol has been used in many distributed products, such as Chubby, PaxosStore, Alibaba Cloud X-DB, and Ant Financial OceanBase. It is generally believed that the Raft protocol has lower performance than Paxos because it only allows committing entries in sequence. However, TiKV that uses Raft officially declares that it has made many optimizations on Raft and has significantly improved the performance of Raft. POLARDB is another Alibaba Cloud database that also uses Parallel-Raft to implement the parallel commit capability in Raft.

Raft works by electing a leader in the cluster. Leader can override logs.

DHT, CRDT and Models for Consistency

Many real-world systems, even global financial networks, choose availability over consistency, so that people can get on with business. The coordination protocols are complicated.

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table: key-value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key.

A conflict-free replicated data type (CRDT) is a data structure which can be replicated across multiple computers in a network, where the replicas can be updated independently and concurrently without coordination between the replicas, and where it is always mathematically possible to resolve inconsistencies that might come up.

CRDTs let you avoid conflict by defining rules about how changes are applied to a single data point.

Redis CRDT

https://redis.com/redis-enterprise/technology/active-active-geo-distribution/

Braid

https://braid.org/

Calvin transaction protocol

https://jepsen.io/analyses/faunadb-2.5.4

8 Fallacies of Distributed Systems

https://github.com/bobbae/gcp/wiki/Fallacies#8-fallacies-of-distributed-computing

Consistency

https://www.youtube.com/watch?v=EOlw6dlxXU0