Technical documentation - subquery/go-subql-substrate-dictionary Wiki
The go-subql-substrate-dictionary was written in Golang, which is known for it's built-in concurrency. The dictionary consists of multiple components which will be described in the following rows.
The heart of the dictionary indexer is the Orchestrator, which is the component that acts as the glue between all the other components. It's role is to start the other components, initiate their state, either by starting from the beginning or trying to recover the state from a previous run if an error stopped the program and feed the clients batches of blocks in order to be processed.
- The first step marked in the diagram above is the Orchestrator starting the Spec version client. The Orchestrator starts the client and checks if there is data from a previous run. If spec version data is found, the Orchestrator tells the Spec version client that it can start from the last indexed block found in the database, if not it will start from the first chain block (which is the block with height 1 for every chain).
Spec version client
The spec version client is responsible with the indexing of the runtime spec version and querying of the metadata of the substrate node.
- Using a modified binary search algorithm, the spec version client will try to determine the last block for a specific spec version range. For this it will query the HTTP RPC endpoint of the substrate node asking for the spec version of given block heights
Extrinsic and Event clients
These are the clients that will index the actual data of interest. They will query the raw data from Rocksdb, decode it, parse it and insert the relevant parsed data into Postgres.
The Orchestrator will traverse all the nodes synchronized by the substrate node in batches of custom size. The condition for a batch traversal is that a specific batch must be inside a spec version range in order to use the same spec version and metadata for a batch decoding. When the Orchestrator has a spec version update, it will also query the node RPC endpoint to get the metadata for the new spec version.
The Orchestrator will open a channel(GO's way of communicating between goroutines) for both extrinsic and event clients for each batch. It will query Rocksdb for the lookup keys of blocks inside a batch and will send the lookup keys and the block height on the two channels opened.
Both extrinsic and event client will use the lookup key they received to get the relevant data from Rocksdb
- The extrinsic client will get the block body, parse it and extract the extrinsics from it. If the substrate node has EVM support, it will also try to determine if a specific extrinsic represents an EvmTransaction and save it accordingly.
- The event client will traverse the patricia merkle trie starting from the State Root saved in the block header using as the path key the encoded concatenation of the module "system" and method "events"(0x26aa394eea5630e07c48ae0c9558cef7|80d41e5e16056765bc8461851072c9d7). The traversal is finished when either we found a leaf node in the trie, or the key path is finished. More details can be found here. From the events storage we extract generic events and the success status of extrinsics. If the chain is evm-based, we also extract EvmLogs and more data related to EvmTransactions.
Both clients will have a Postgres client waiting for data on another channel. That database client will have an insertion or update buffer (depending on the case) which will accumulate the processed data by the extrinsic/event client. When a batch is finished, each client will notify it's database client that it has to insert or update using the data accumulated in each buffer. Even though the batch size is known (batch size represents the number of blocks processed before an insertion), the buffer size is not as there might be much more events/extrinsics inside a block.
The first one that will insert data in Postgres must be the extrinsic client as the event client must update data already inserted by the extrinsic client (6 and 7).
After finishing all the current known blocks, the Orchestrator will try to synchronize the Rocksdb instance it is using with the main one and ask the Spec version client to update the spec versions and the Extrinsic and Event clients to continue the processing up to the new last synced block number.