Glossary - lenary/cs4414-project GitHub Wiki

Distributed Systems

  1. Distributed System: a set of computers and processes working together to solve a problem. The main difference between Distributed Systems and conventional, single-machine systems, is that a) communication and coordination becomes more important and complex as more machines are added, and b) communication can fail, and the system may never know.

  2. CAP Theorem: A theroem (so proven) by Eric Brewer which states that in a Distributed System, where networks can have arbitrary Partitions (P) (see Network Partition), the system can not be observed to be both Strongly Consistent (C) (where all requests "agree" with each other) and Highly Available (A) (where all requests are answered, regardless of whether they might give conflicting answers).

  3. Node: a single process, that communicates with other processes only over the network. Usually processes are on seperate machines, but for testing purposes, they could all be on the same machine. It should never be assumed that they will be on the same machine.

  4. Cluster: a collection of Nodes working together, communicating via the network. Usually this will mean the Schooner Cluster.

  5. Peer: a Node that is a member of a Cluster, as opposed to a Client. Usually, this will mean a Schooner Peer.

  6. Client: an external process that interacts with the Cluster (usually via one of its member nodes), but does not cooperate in the same way that cluster member nodes do. Clients usually

  7. Network: the (incredibly complex and failure-prone) system by which Nodes communicate.

  8. RPC aka "Remote Procedure Call": a system by which one Node calls code that executes on a different node. This is different from a regular procedure call due to the many extra ways it can fail because it uses the Network.

  9. Network Partition: a particular kind of failure at the network level where a subset of the Cluster will be completely unable to communicate with the rest of the Cluster. Network partitions are not always detectable. The failure of a set of Nodes in such a way that they stop communicating can be considered to be a Network Partition in which those nodes are seperated from the rest of the Cluster.

  10. [Distributed] Consensus Algorithm: an Algorithm (and Protocol) that, by some means, causes as many Nodes as possible to agree about the value of some mutable state. Raft is a Distributed Consensus Protocol.

Consensus Algorithms

  1. Raft: a Distributed Consensus Algorithm, described at http://raftconsensus.github.io/.

  2. Paxos: another Distributed Consensus Algorithm, defined in a paper by L. Lamport called "The Part-Time Parliament". It has many variants and extensions including Multi-Paxos and Fast-Paxos.

  3. Zab aka Zookeeper Atomic Broadcast: another Distributed Consensus Algorithm, developed based on Paxos for the Zookeeper "Coordination Service".

Raft

  1. Leader: a temporarily-elected primary peer, which, for the duration of its leadership, has extended privileges to append entries to the Log

  2. [Consistent] Log: the sequence of Commands that the Raft cluster has agreed upon for the Replicated State Machine.

  3. [Replicated]/[Deterministic] State Machine: This is the state machine that contains the application logic. There's a copy on each Peer (ie it's Replicated). Importantly, the same sequence of Commands should always result in the same "state" of the machine (ie it's Deterministic).

  4. Command: A new input for the Application's State Machine. The structure and semantics is entirely up to the application, not Raft. One example might be "lock" and "unlock" for a lock state machine, but other state machines could be far more complicated (ie a whole database).

Schooner

  1. Schooner: an implementation of Raft, written in Rust.

Rust

  1. Rust: a f$&#ing awesome systems programming language, unless you're running into pointer ownership issues, in which case it's just a f$&#king systems programming language. Also a kind of Iron corrosion, if you're a real Engineer.

  2. Task: a Rust concurrency mechanism, akin to a Thread. All tasks run in the same OS-level process, and are isolated from each other. They communicate by sending and recieving data over Channels.

  3. Channel: a uni-directional communication mechanism, akin to a Queue. It is used for task-to-task communication in Rust. As Channels are still process-local, they do not fail like a network connection would, indeed it is assumed that the failure of a channel means the whole rust process has failed, so the Node can never find out that's happened (as it has stopped already).