General: Streamline: Core Setup - FlipsideCrypto/fsc-evm GitHub Wiki

Deploying Core Streamline Models

Summary:

Streamline enables us to rapidly scale our data processing and ingestion capabilities with the use of AWS Lambdas, Snowflake External Tables, and DBT Models. Streamline models in each EVM repo are organized into Bronze and Silver folders, with additional nesting by category.

Bronze Layer

  • These models are materialized as Views and use the fsc_evm.streamline_external_table_query and fsc_evm.streamline_external_table_fr_query macros to select the raw node responses that are stored in External Tables / S3.
  • The various versions of these models include views that select the last three (3) days of inserted rows (optimized performance for incremental loads downstream) and those that reference External Tables for the entirety of stored history (useful for full-refresh scenarios in the downstream models).
    • Note: The full-refresh (fr) models may require multiple versions deployed that differ slightly based on the structure of the deployed External Tables. These are denoted with the v1 or v2 suffix. If this is the case, a comprehensive view that unions data from all full-refresh version models is required to ensure we can access 100% of stored history in downstream models.

Silver Layer

  • Request Models: Materialized as Views, these models establish the JSON RPC / API requests to the node by referencing a spine of blocks and constructing objects to output the applicable calls. By leveraging the fsc_utils.if_data_call_function_v2 macro and streamline.udf_bulk_rest_api_v2 generic function in the model's DBT config, and running the model with the appropriate variables, e.g. --vars '{"STREAMLINE_INVOKE_STREAMS":True}', the requests are sent to the node. In conjunction with AWS Lambdas, the Streamline pipeline is established and results are returned to AWS S3, which can then be queried in Snowflake alongside External Tables. These External Tables are defined and deployed via the streamline-snowflake repo.

    • Realtime: Includes the last three (3) days of blocks only

    • History: Includes all blocks prior to the last three (3) days

  • Retry: Materialized as Ephemeral, these models are designed for blocks, transactions, receipts, traces or other data that may be missing or unconfirmed. By referencing them in the request models, we can automate the process of re-requesting missing or incomplete data from the previous runs.

  • Complete Models: Materialized as Tables, these models query all blocks that have been requested, and have successfully landed in the External Tables / Bronze Views. The complete models are required to properly implement the requests models, as they prevent re-requesting data that was already fetched.

  • Supporting Models: Additional models that assist in data processing, such as block lookbacks, ranges, and sequences, which help produce or adequately limit the request models.

Best Practices, Tips & Tricks:

Implementation Steps, Variables & Notes:

Examples, References & Sources:

Example Code:

Example Folder Structure and Model Hierarchy:

> models/streamline/bronze/core
    - bronze__streamline_blocks.sql
    - bronze__streamline_fr_blocks_v1.sql
    - bronze__streamline_fr_blocks_v2.sql
    - bronze__streamline_fr_blocks.sql
    - ...
    - bronze__streamline_fr_transactions.sql

> models/streamline/silver/core
    >> Realtime
        - streamline__blocks_transactions_realtime.sql
        - ...
    >> History
        - streamline__blocks_transactions_history.sql
        - ...
    >> Complete
        - streamline__complete_blocks.sql
        - ...
        - streamline__complete_transactions.sql
    >> Retry
        - _missing_txs.sql
        - ...
    - streamline__blocks.sql
    - streamline__get_chainhead.sql

> models/streamline/silver
    - _block_lookback.sql
    - ...