Snapshot - zoobc/zoobc-core GitHub Wiki

Snapshot

Description

One of our objectives is to reduce the blockchain download time to a constant (or nearly so). The blockchain download time is a notorious problem, as traditionally each node must download and apply in sequence the entire history of previous blocks and transactions before it can begin evaluating the validity of new transactions, and the size of this historical record necessarily continues to grow for the lifetime of the chain.

To address the blockchain bloat issue, in ZooBC, each node periodically (at a block height agreed by the network) takes a snapshot of the current state of its database, and computes a set of hashes for this snapshot. To be sure the node has the same snapshot as all the other nodes, it compares its new snapshot hashes against the hashes of the snapshot as calculated with the hashes in the metadata of a new spine block proposed by a blocksmith.

If the blocksmith uses hashes that, combined with known snapshot hashes, lead to the same set of hashes the majority of nodes in the node registry has calculated, its block is approved; otherwise it is rejected. Any new node joining the network then only needs to download the spine blocks until it finds the block with the hashes of the latest database snapshot, and downloads it in chunks from its peers, to come up to a recent database state, from which it can finish downloading the most recent blocks to catch up to the rest of the network.

Snapshot creation process

  • Approximately once per month, all nodes deterministically select a block height at which the next database snapshot should be taken. Each node will wait until enough blocks have elapsed that they have surpassed the “maximum rollback height”, in order to guard against instances where the node begins computing the snapshot and then must abandon the process to process a rollback.

  • At this point the node initiates the process to construct a file that represents the exact state of the node’s database from the earlier block height determined to be the snapshot height. Once this file is constructed, it is saved as a multiple-chunks snapshot file and its hashes are computed and stored in the spine_block_manifest table

  • In order to give nodes time to construct the snapshot file, a grace period of a determined number of blocks is given before it a spine block can reference this spine_block_manifest record. This ensures that once the new spine block is broadcast, all nodes, even those running on low-end hardware such as an IoT device, should be able to construct and hash their snapshot file, and thereby validate the hashes in the new spine block against the one they independently computed.

Snapshot file specs

  • Each snapshot file contains the blockchain's last state at a given height, plus the partial history of the tables needed by blocksmith process to validate newly downloaded blocks. Partial, means we include in the snapshot enough data from those tables to allow the smithing process to work properly in all scenario, including fork processing.
  • Each table is exported in a json-like format and encoded using CBOR data format.
  • The file is then split into chunks of equal size (see SnapshotBasicChunkStrategy) and persisted to the node storage.

Snapshot download process

Almost all blocksmiths (smithing nodes), apart from the ones that have started from a snapshot and haven't generated one by themself yet, have a copy of the last full snapshot file (all file chunks) and each node keeps a list of resolved (active) peers, from Peer Discovery process.

algorithm

  • When a new node joins the network, first thing it sequentially downloads all spine blocks (TODO: document spine blocks and link here).
  • When finished it checks if there is at least a snapshot spine block manifest record, loads the last available (the one referenced by the spine block with highest height) and loops through the snapshot chunks 'hashes' (that correspond to the chunks filenames too) contained in it and for each of them, download it from a random peer in the resolved peer list. The download process will also retry failed downloads by randomly polling the resolved peer list until either all chunks have been downloaded completely or the list is exhausted.

Snapshot application process

When a node has downloaded all snapshot chunks they are assembled, decoded and applied to the node database. After that it keeps downloading the remaining blocks (at least a minimum 'rollback-safe' number of blocks) till it synchronizes with the rest of the p2p network. The node, at this stage will contain the transaction history of the latest downloaded blocks, while transactions at height lower than the downloaded snapshot's height aren't available, resulting in a much smaller blockchain's file size, although preserving the blockchain internal state in consensus with the rest of the network

Flowchart

flowchart