Capturing Blockchain Reorganizations in Incremental DBT Models

Summary:

What are Block Reorgs?

Block reorganizations (reorgs) are events that occur in Ethereum Virtual Machine (EVM) based blockchains when different nodes mine new blocks simultaneously, often due to network congestion or targeted attacks. During a reorg, the blockchain undergoes a temporary rollback until the longest and most consistent chain is identified. This process ensures that all participants have an accurate and consistent understanding of the blockchain's state.

Reorgs can cause transactions to move from one block to another, change their order, or even disappear entirely if they're not included in the new chain. In some cases, transaction hashes might change within the same block. These events are a natural part of blockchain operations, serving as a mechanism for resolving discrepancies in block ordering and transaction processing before finalization. However, reorgs can lead to inconsistencies in data models, especially within incremental loads.

How do we handle this in our models?

We handle block reorgs using a combination of a custom dbt macro, fsc_utils.block_reorg and a GitHub Actions workflow. Please reference the Block Reorg Technical Specifications doc for more details.

The block_reorg macro - fsc_utils.block_reorg:
- This macro is designed to clean up our data models after potential reorg events. It works by comparing our target models to the silver__transactions table, joining on block number and transaction hash. The macro looks for transactions that might be in the incorrect blocks due to reorgs. It deletes any records in our models that were inserted within a specified time frame (e.g., last 12 hours) and don't have corresponding entries in the silver__transactions table. This process helps maintain data consistency by removing duplicate, incorrect or misplaced transaction records.
GitHub Actions Workflow - dbt_run_operation_reorg.yml:
- We use a scheduled workflow that runs on a schedule cadence to handle potential reorgs. The workflow first generates a list of models that need to be checked for reorgs. It does this by looking for models with the reorg tag. Then, it executes the block_reorg macro, passing in the list of models and the period of time to look back on and query against. This automated process ensures that our models are regularly cleaned up to account for any discrepencies that might have occurred.
There are at least two primary types of chain reorgs that may occur:
- When a transaction shifts to a new block:
  - The macro will delete the old record and keep only the new, correct one.
  - Example scenario:
    - tx_A is in block_1, model_1 loads tx_A on block_1
    - Reorg occurs
    - tx_A is now in block_2, model_1 loads tx_A on block_2
    - model_1 needs to keep only tx_A in block_2 and delete tx_A from block_1
    - based on the delete+insert config in silver.logs, the silver.logs deletes+reinserts block_1
    - tx_A is no longer in block_1 in silver.event_logs, but model_1 still has tx_A in block_1 and block_2 (tx_hash is duplicated across two blocks)
    - model_1 runs incremental but doesn't delete block_1 because it no longer exists in silver.logs (dependent on the model's config for delete+insert), however the duplicate tx_hash still exists in block_1 and block_2 in model_1
    - model_1 now has tx_A in block_1 and block_2, with unique event_index/_log_id but duplicate tx_hash, thus requiring the block_reorg macro
- When a transaction hash changes within the same block:
  - Our process of deleting and reinserting block data ensures we capture these changes without the block_reorg macro.
  - Example scenario:
    - tx_A is in block_1, model_1 loads tx_A on block_1
    - Reorg occurs on block_1
    - tx_A is now tx_B, but still in block_1
    - model_1 deletes+reinserts block_1 correctly (silver.logs solves this by qualifying on block_number, POSITION instead of tx_hash)

Best Practices, Tips & Tricks:

It is recommended to run this macro at a cadence that aligns with the potential frequency of the blockchain's reorgs.
To test in the DEV environment, you can run the same steps listed in the workflow.
- dbt list --select "ethereum_models,tag:reorg" --resource-type model... will reveal all models tagged as reorg in the repo. Those models will get picked up in the block reorg job. If you notice models that the block_reorg should not apply to, please troubleshoot the command or remove the tag from those models.

Implementation Steps, Variables & Notes:

Identify the models impacted by block reorgs. Typically these are models that are built downstream of event logs or traces, and reference specific block_number, tx_hash or related details that are subject to change.
Add the reorg tag to the tag portion of the dbt model's config:

{{ config(
    materialized = 'incremental',
    incremental_strategy = 'delete+insert',
    unique_key = "block_number",
    cluster_by = ['block_timestamp::DATE'],
    tags = ['curated','reorg']
) }}

Establish the dbt_run_operation_reorg.yml workflow;
- Determine the proper cadence for the scheduled job. Typically we set this to run once every 8 hours, but this may vary dependent on the expected frequency of block reorgs per blockchain.
- List reorg models Step: update the repo name in the List reorg models step of the job. E.g. <REPO_NAME>,tag:reorg
- Execute block_reorg macro Step: update the number of hours that the delete statement should use to look back on. Typically we set this to 12 hours, but this may vary. This lookback should be greater than the schedule cron job cadence to ensure there is coverage across all hours / potential occurrences.

Examples, References & Sources:

dbt_run_operation_reorg,yml - GHA Workflow
- Note: this is where the block_reorg macro is referenced. Please adjust the parameters as needed.
model that requires the reorg tag
- According to the delete+insert incremental strategy in the config, this model's unique_key is block_number and requires the reorg tag. If a new block_number loads into the model on a block reorg occurrence, the new block_number, tx_hash, event_index etc. will be inserted, however, the previous, incorrect or reorganized block_number, tx_hash, event_index etc. may not be deleted if it's outside of the incremental lookback window.
model that DOES NOT require the reorg tag
- According to the delete+insert incremental strategy defined in the config, this model's unique_key is pool_address and does not require the reorg tag. If a new pool_address loads into the model on a block reorg occurrence, the previous or incorrect block_number, tx_hash, pool_address etc. will be deleted and the latest block_number, tx_hash, pool_address etc. will be inserted.

General: Other Processes: Block Reorgs - FlipsideCrypto/fsc-evm GitHub Wiki

Capturing Blockchain Reorganizations in Incremental DBT Models

Summary:

Best Practices, Tips & Tricks:

Implementation Steps, Variables & Notes:

Examples, References & Sources:

⚠️ GitHub.com Fallback ⚠️

General: Other Processes: Block Reorgs - FlipsideCrypto/fsc-evm GitHub Wiki

Capturing Blockchain Reorganizations in Incremental DBT Models

Summary:

Best Practices, Tips & Tricks:

Implementation Steps, Variables & Notes:

Examples, References & Sources:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️