10 18 Distributed Snapshots Determining Global States of Distributed Systems - lkuper/CSE232-2021-09 GitHub Wiki
Discussion for: K. Mani Chandy and Leslie Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems" (TOCS, 1985)
Paper overview
Chandy and Lamport propose an algorithm to record the global state of a distributed system. They achieve this by decentralizing and relegating the recording of local states and channel states to each individual process. Once the system has recorded each local state, a global state of the whole system can be built.
The motivation established to do this is that of recording a stable property of a system. A stable property is defined as a property that once established to be true will hold for the rest of the computation. Examples of stables properties are given as; "computation has terminated", "the system has deadlocked", and "all tokens from a token ring have disappeared".
Of course this is not the only motivation for recording the global state of a distributed system. Commonly established motivations are that of creating checkpoints and for failure recovery. As with the recording of a global state, it is possible to recover(by rolling back) to a previous state of execution.
For the snapshot algorithm to work, Chandy and Lamport propose a series of abstractions as their model of a distributed system. The most relevant for the algorithms execution is the abstraction of a channel. A channel is the "route" of communications processes use to communicate with one another. The state of a channel is defined as the sequence of messages sent through the channel. The abstraction of marker messages is also used in the algorithm, as these messages define the bounds of when process states and channel states are recorded.
A significant point to be noted about the algorithm is how it doesn't rely on a consensus to be built for it to run. Each processes can act independent of each other and without having to notify any others of its current state. This is a significant departure from other algorithms that rely on a centralized manager.
The details of the algorithm itself are left as an exercise to the reader. Although a summarized version is included.
-
The initiator process:
- records its own state
- sends a marker message out on all its out going channels
- start recording the messages it receives on all its incoming channels
-
When process Pi receives a marker on channel Cki:
- If it's the first marker message Pi has seen:
- Pi records its state
- Pi marks the channel Cki as empty
- Pi sends out markers on all its out going channels
- Pi will start recording on all its incoming channels, except Cki
- IF Pi has already seen a marker:
- Pi stops recording on Cki, and sets Cki's state to the sequence of messages received.
- If it's the first marker message Pi has seen:
The authors go into significant detail on how the distributed snapshot algorithm resolves their set motivation of determining a stable property in a distributed system. The final section of the paper contains a full solution to the "stability detection" problem.
Discussion questions
Q1
(Contributed by @aakash-mishra)
One of the key assumptions in the paper is that both the marker messages and application messages must follow FIFO delivery. What are some possible issues that could arise if this assumption were to be dropped and FIFO delivery is violated?
Discussion summary for Q1
If FIFO delivery is violated the integrity of the global snapshot is significantly compromised. The results are either information loss or inconsistent snapshots.
In the case of information loss consider the following scenario:
If a process were to send a normal message and then a marker message to another process, and the marker marker arrived first, the second process would mark the incoming channel as empty before it received the message. In this case the snapshot would be inaccurate, because the first process would have recorded the message, and the second would not have. While not by definition an inconsistent snapshot, this scenarios results in a channel state recording being inaccurate and losing information of the workload.
In the case of inconsistent snapshot consider the following scenario:
If a marker message is sent before a normal message but is delivered after the normal message is delivered. This is a quite significant issue. As it results in a properly inconsistent snapshot, as the sending of the normal message would not be recorded in a process state but the delivery of the message would be recorded in the channel state.
Both of these scenarios significantly impact the integrity of the global snapshot algorithm.
Q2
In section 3.3 of the paper, Chandy and Lamport say, "The algorithm can be initiated by one or more processes, each of which records its state spontaneously, without receiving markers from other processes." (Emphasis mine.) Consider the example run of the algorithm that we walked through in class last time. In our example, P1 initiated the algorithm. Walk through the example and convince yourself that we would still have a consistent snapshot if P2 or P3 also independently decided to initiate the algorithm at around the same time. Why is it important and useful that more than one process can initiate the algorithm at once?
Discussion summary for Q2
The decentralized nature of the algorithm is very valuable. By using channel state recording and FIFO, there is no information loss that usually results from distributed failure. In our discussion, it was pointed out that de-centralization permits us to avoid consensus methods and leadership among processes, both techniques that are vulnerable to partial failure. In essence, processes can independently take decisions without consulting other components in the system. This is just plain faster and more efficient. Therein lies the essence and the genius of this algorithm.
Q3
This paper proposed the first snapshot algorithm for distributed computations in 1985. In the years that followed, many other snapshot algorithms were proposed, some of which make use of different assumptions than Chandy and Lamport's. A 1995 survey paper discusses a variety of snapshot algorithms. Take a brief look at the survey paper and how it is organized. Based on the survey paper's organization, how do the authors suggest categorizing snapshot algorithms?
Discussion summary for Q3
The authors suggest categorizing the algorithms based on the properties of their communication channels.
The categories are: FIFO delivery algorithms Non-FIFO delivery algorithms Causal delivery Algorithms
FIFO delivery algorithms are the category for which the Chandy-Lamport algorithm belongs to. By relying on FIFO a significant level of complexity can be avoided in the algorithm. A relevant property of these type of algorithms is that it permits the snapshot to be recorded without ever interrupting computation, while avoiding exemption causality. An interesting example of FIFO delivery algorithms is the Spezialetti-Kearns method, as it includes the full steps to compile the individual process states into a global state.
Non-FIFO delivery algorithms lose quite a bit of simplicity and performance, as the only techniques to accurately record a global state are that of message inhibition and that of piggybacking message control information. Message inhibition involves delaying the execution of a process or the sending of a message to maintain order. While piggybacking message control information involves distinguishing messages sent before and after a marker message by adding extra metadata to messages.
Causal delivery algorithms are considered simpler than the other two as they provide built in message synchronization and control. This lets the algorithm avoid the use of marker messages for example.
Within these categories there are a variety of different algorithms. Some use central processes to handle state recording, others optimize on the performance of the Chandy-Lamport Algorithm. If there is a conclusion to be drawn is that the most ideal distributed snapshot algorithm deeply depends on the type of system it is being implemented on.
Errata
Running list of typos / issues found in the paper:
- None found
Other
Any interesting points or questions from the group discussion that didn't fit above:
- An interesting note made by Chandy-Lamport is that of how, once all the process states and channels states are recorded, can the states can be recompiled into a truly global state recording. They consider this a trivial algorithm and suggest the solution of distributing each individual process state and channel state to every other process in the system. This would result on all processes having a global view of the system state. They do not implement this however.
- Another important discussion point was around the other algorithms in the survey paper highlighting different types of delivery algorithms, such as non-FIFO and causal delivery, and how none of them can be claimed to be unequivocally better than the Chandy-Lamport algorithm