11 05 Chain Replication for Supporting High Throughput and Availability - lkuper/CSE232-2021-09 GitHub Wiki
Discussion for: Robbert van Renesse and Fred B. Schneider, "Chain Replication for Supporting High Throughput and Availability" (OSDI 2004)
Paper overview
This paper proposes the replication and request-processing protocol called chain replication. In chain replication, a chain of processes are selected to maintain a volume (group) of objects. Write requests go to the head of the chain, which after applying the write passes it to the next node, until the write reaches the tail of the chain. Read requests, on the other hand, go to the tail of the chain. The paper also talks about how process failures are handled, and how nodes can be removed and added to chains. It also talks about different strategies for distributing volumes of objects among the servers. The paper assumes the fail-stop model, with a master (or a cluster of masters) that orchestrates topology changes. Different cluster settings are also discussed, and the paper asserts chain replication to be best for homogenous LAN clusters. Lastly, the paper discusses the similarities and the differences between chain replication and the primary-backup protocol.
Discussion questions
Q1
In class, we discussed how chain replication and primary-backup replication compare to each other with regard to throughput and latency. The RADOS distributed storage system uses a variant of chain replication called splay replication that tries to combine the strengths of the two approaches.
Look at Figure 2 of the RADOS paper), which illustrates splay replication. How do you think splay replication might improve on chain replication and primary-backup replication?
Discussion summary for Q1
When comparing chain replication with splay replication, since the head node updates all other nodes in parallel, we significantly improve update latency, and therefore update throughput, in splay replication.
When comparing primary-backup replication with splay replication, since (in splay replication) the head node handles updates (writes) and the tail node handles queries (reads), we get a lower query latency in splay replication. Furthermore, splay replication (similar to chain replication) can improve overall throughput because of its segregation of responsibilities — The tail responding to read queries on its own, and the head receiving write requests.
So, in a nutshell, splay replication combines the best of both chain replication and primary-backup (referred to as primary-copy in this RADOS paper) replication while gaining latency and throughput benefits over primary-backup replication. This comes at a similar bandwidth usage to p/b, which is more than what chain replication uses.
Q2
(Discussed on Zulip by @aakash-mishra and others)
Section 3 of the paper says:
Note that some redundant computation associated with the t-1 servers is avoided in chain replication because the new value is computed once by the head and then forwarded down the chain, so each replica has only to perform a write. This forwarding of state changes also means update can be a non-deterministic operation -- the non-deterministic choice is made once, by the head.
This approach is known as passive replication: compute the result of an operation on one replica, and then forward the result of the operation to other replicas. A alternative would be to forward the actual operation to be done and then do the computation individually on each replica. This alternative approach is known as active replication.
Under what circumstances do you think passive replication would be better? Under what circumstances would active replication be better? Discuss with your group.
Discussion summary for Q2
Passive replication is preferable when the computation load is heavy, and the result is light, i.e., it can be transmitted easily to other nodes. E.g., the dot product of two large vectors, which is a single number.
Non-deterministic operations can be handled well with passive replication, as even if the computation is non-deterministic, it will be only done once on one replica, and the result would be propagated to the others.
Alternatively, active replication (or State Machine Replication) is preferable when bandwidth is scarce and propagating the change results requires significant communication, and each machine has the resources to compute all update results. Ex: an SQL query with a large result.
(Contributed by @Smruthi-Pobbathi)
Q3
The paper says, "Our current prototype is intended primarily for use in relatively homogeneous LAN clusters." Why might chain replication (and primary/backup replication) not be the most appropriate protocol to use across the wide-area network (say, one server in London, one in Tokyo, and one in San Francisco)?
Discussion summary for Q3
There are two problems in using chain replication with wide-area networks compared to local-area networks: latency and homogeneity. Since the nodes are spread over a larger area in wide-area networks, update (write) latency can become a significant factor as information flows sequentially from the head node through every node in the chain. Also, since we can have nodes spread across different regions, there is a higher chance of heterogeneous hardware configurations, leading to load imbalances, especially with uniform random placement of volume replicas. Additionally, in a WAN, one has less control over the transparency of the network when compared to LAN. Furthermore, chain replication relies on guaranteed communication to determine when a process goes down to implement the fail-stop model. It is much more difficult to tell when a process goes down in a WAN when compared to LAN, making it challenging to implement the fail-stop model, and therefore chain replication in WAN. The same arguments also holds for primary backup, with the difference that in p/b, the primary waits for all backups to respond before responding to a query, which has a different latency profile than chain replication.
Errata
Running list of typos / issues found in the paper:
- (to be filled in by scribes, if any)
Other
Any interesting points or questions from the group discussion that didn't fit above:
- (to be filled in by scribes, if any)