General: DeFi: DEX Liquidity Pools - FlipsideCrypto/fsc-evm GitHub Wiki

Building and Deploying DEX Liquidity Pool Models

Decentralized Exchanges or DEXs, are permissionless marketplaces where tokens can be bought or sold, without a centralized intermediary. For EVMs, activity occurs on Liquidity Pools, which are smart contracts comprised of token pairs that enable automated market making and token swaps. This guide will detail how to 1) Research DEX protocols and 2) Build the Silver/Gold DBT model layers for Liquidity Pools.

Research Phase

Please reference the Research Phase section in How to Research and Curate EVM Protocols for official documentation on the research process. The following information will be relevant to DEX Liquidity Pools.

Summary:

In order to properly build a liquidity pool model for a DEX protocol, you need to identify the factory contract(s) and the pairCreated (or similar) event topic 0.
- Factory Contract: Responsible for deploying new liquidity pool contracts. Typically there is only one factory contract per DEX protocol version. In some protocols, the factory contract may also be responsible for setting and updating trading fees for the pools.
- Pair Created Event: Emitted when a new liquidity pool is generated on the factory contract. This may be emitted after the createPair() function or similar, is called on the factory contract.
  - Example Pair Created Event Parameters:
    - token0: Address of the first token in the pair
    - token1: Address of the second token in the pair
    - pool: Address of the newly created pair contract (liquidity pool)

Best Practices, Tips & Tricks:

Depending on the protocol, some liquidity pools may represent more than two tokens. Examples include Balancer and Curve, please reference these models to see how these are handled differently than a simple liquidity pool model with a pair of tokens.

Silver Layer Modeling

This guide outlines the process of creating the Silver Layer DBT models for tracking DEX liquidity pool creation events or traces on EVM-compatible blockchains, along with the applicable DBT tests for maintaining data integrity. This includes both Protocol Specific and Curated Complete models.

Please reference the Silver Layer Modeling section in How to Research and Curate EVM Protocols for more details on the modeling process. The following information is related directly to Liquidity Pools only.

1. Protocol Specific Models

Summary:

Built with silver.logs and/or silver.traces, the Silver models are designed to capture essential information about newly created liquidity pools, including token pairs, pool addresses, creation block_number, tx_hash, block_timestamp and other relevant values. The pool_address, token0 and token1 are the most important columns to extract, as these will be crucial for building the subsequent DEX Swaps models.

Best Practices, Tips & Tricks:

The DEX models are some of the first curated models deployed by Flipside, and therefore may have non-standardized structures or naming conventions. Over time, these models may be adapted to new standards.
In rare cases, models may require contract reads to collect the tokens and pool addresses, if they cannot be parsed onchain through event logs or traces. Please reference the guide on Contract Reads for more details.
Certain liquidity pool contracts are only deployed via creation traces, rather than creation events. This is common for protocols such as Balancer and Curve.
- These can be queried by filtering for TYPE ILIKE 'create%' on silver.traces, where the from_address is the deployer contract and to_address is the liquidity pool. Note that it's likely there are multiple from_address / deployer contracts to query for.

Implementation Steps, Variables & Notes:

Set up the model configuration
- Use incremental materialization with a delete+insert strategy.
- Define a unique key (typically the pool_address) to identify each record.
- Add relevant tags for model categorization (typically tagged as curated) so that the model may be picked up by the relevant scheduled workflows.

{{ config(
    materialized = 'incremental',
    incremental_strategy = 'delete+insert',
    unique_key = "pool_address",
    tags = ['curated']
) }}

Create a CTE for pool creation events
- Filter logs for specific factory contract addresses and event topics.
- Manually parse event data to extract relevant information (tokens, pool address, etc.).
  - For incremental runs, filter data based on the latest _inserted_timestamp, with a 12 hour lookback.

WITH pool_creation AS (
SELECT
      block_number,
      block_timestamp,
      tx_hash,
      event_index,
      origin_function_signature,
      origin_from_address,
      origin_to_address,
      contract_address AS factory_address,
      regexp_substr_all(SUBSTR(DATA, 3, len(DATA)), '.{64}') AS segmented_data,
      CONCAT('0x', SUBSTR(segmented_data [0] :: STRING, 25, 40)) AS pool_address,
      CONCAT('0x', SUBSTR(topics [1] :: STRING, 27, 40)) AS token0,
      CONCAT('0x', SUBSTR(topics [2] :: STRING, 27, 40)) AS token1,
      TRY_TO_NUMBER(utils.udf_hex_to_int(
          segmented_data [1] :: STRING
      )) AS pool_id,
      _log_id,
      _inserted_timestamp
FROM {{ ref ('silver__logs') }}
WHERE
     contract_address IN (
       '0xb076b06f669e682609fb4a8c6646d2619717be4b', --v1 factory
       '0x43ec799eadd63848443e2347c49f5f52e8fe0f6f' --v2 factory
     )
     AND topics [0] :: STRING = '0x0d3648bd0f6ba80134a33ba9275ac585d9d315f0ad8355cddefde31afa28d0e9' --pairCreated
     AND tx_status = 'SUCCESS'
{% if is_incremental() %}
AND _inserted_timestamp >= (
    SELECT
        MAX(_inserted_timestamp) - INTERVAL '12 hours'
    FROM
        {{ this }}
)
{% endif %}
)

Add the final SELECT statement
- Select all relevant fields from the pool_creation CTE.
- If necessary, add a qualify clause on pool_address to handle potential duplicates, keeping the most recent entries based on _inserted_timestamp.

SELECT
    block_number,
    block_timestamp,
    tx_hash,
    event_index,
    origin_function_signature,
    origin_from_address,
    origin_to_address,
    factory_address,
    token0,
    token1,
    pool_address,
    pool_id,
    _log_id,
    _inserted_timestamp
FROM
    pool_creation 
qualify(ROW_NUMBER() over(PARTITION BY pool_address
ORDER BY
    _inserted_timestamp DESC)) = 1

Create the .yml file and add table/column level tests
- At minimum tests should include, but are not limited to, the following;
  - Uniqueness on 1+ columns (typically pool_address but may vary by model)
  - not_null tests on token0, token1, pool_address

version: 2
models:
  - name: silver_dex__fraxswap_pools
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - POOL_ADDRESS
    columns:
      - name: POOL_ADDRESS
        tests:
          - not_null
      - name: TOKEN0
        tests:
          - not_null
      - name: TOKEN1
        tests:
          - not_null

Examples, References & Sources:

Example models
- Example using creation event logs: ethereum.silver_dex.fraxswap_pools.sql
- Example .yml file for tests: ethereum.silver_dex.fraxswap_pools.yml
- Example using creation traces: ethereum.silver_dex.hashflow_pools

2. Complete Models

Summary:

The complete_dex_liquidity_pools model aggregates all protocol specific liquidity pool models together into one, allowing centralized joins on silver.contracts for token metadata, streamlined logic and standardized columns. This includes platform names, versions, pool names, symbols, decimals, fees (where applicable), and columns for all individual tokens that make up each liquidity pool.

Best Practices, Tips & Tricks:

Protocol Versions: If there are multiple versions of a protocol, coming from separate models, each should have it's own CTE.
- The name for the platform column should include both the protocol and version separated by a -(dash). You do not need to include, v1, but instead begin incrementing at v2 and beyond.
- The version column should represent a version specific to the CTE/model being referenced. For example, if there are multiple CTEs/models required for Pancakeswap v2, you would add pancakeswap-v2 as the platform and v2-amm as the version in the first CTE, then pancakeswap-v2 as the platform and v2-mm as the version in the next CTE. This is required because the delete+insert incremental strategy uses both platform and version to identify new incremental records.
When adding new Protocols/CTEs/models to the complete model, please pay attention to the number of columns being pulled through, the order of those columns and the data type/format and names of those columns, as this will materially impact the output of the model.
- More specifically, pay attention to how pool_name is handled in the complete_lps CTE. Unique logic is typically required for DEXs built off the uniswap v3 forks, for example, as well as the liquidity pool models built using creation traces (rather than creation event logs), such as balancer and curve.

Implementation Steps, Variables & Notes:

Set up the model configuration
- Use incremental materialization with a delete+insert strategy.
- Define a unique key to identify each record that the incremental strategy should apply to. The unique key defined in the config will be different than the unique column / unique row in the table. This is because the model must delete+insert on a broader key to properly handle block reorg scenarios. Please see the Block Reorg guide for more information. We typically use ['block_number','platform','version'] in the complete liquidity pools model because it allows us to reload blocks, platforms and versions in their entirety, rather than reloading by transaction or event and potentially missing, loading or deleting incorrect records.
- Cluster by columns with low cardinality, such as block_timestamp::DATE and platform, to improve query performance.
- Add Search Optimization (SO) in the post_hook with EQUALITY and SUBSTRING. Please only add SO on non-number columns, where applicable.
- Add relevant tags for model categorization (typically tagged as ['curated','reorg','heal']) so that the model may be picked up by the relevant scheduled workflows.

{{ config(
  materialized = 'incremental',
  incremental_strategy = 'delete+insert',
  unique_key = ['block_number','platform','version'],
  cluster_by = ['block_timestamp::DATE','platform'],
  post_hook = "ALTER TABLE {{ this }} ADD SEARCH OPTIMIZATION ON EQUALITY(tx_hash, contract_address, pool_address, pool_name, tokens, symbols), SUBSTRING(pool_address, pool_name, tokens, symbols)",
  tags = ['curated','reorg','heal']
) }}

Create a CTE for contracts
- Pull through the contract_address, token_symbol and token_decimal columns from silver__contracts. These will be joined into various CTEs in the model to get the token metadata columns.
Create CTEs for each Protocol Specific model/version, where applicable.
- Select all columns from the source table
- Use standardized column names, as each of these CTEs will be unioned together towards the end of the model.
  - Add columns with NULL values in cases where the column exists in other Protocol Specific CTEs.
    - Liquidity Pools may columns such as fee, tick_spacing and up to eight (8) different tokens paired together.
- Hardcode the platform and version, if not done already.
- For incremental runs, filter data based on the latest _inserted_timestamp, with the LOOKBACK variable, defaulted to 4 hours.
  - The name used to reference `var('HEAL_MODELS') should match the name of the CTE.

...
balancer AS (
  SELECT
    block_number,
    block_timestamp,
    tx_hash,
    contract_address,
    pool_address,
    pool_name,
    NULL AS fee,
    NULL AS tick_spacing,
    token0,
    token1,
    token2,
    token3,
    token4,
    token5,
    token6,
    token7,
    'balancer' AS platform,
    'v1' AS version,
    _log_id AS _id,
    _inserted_timestamp
  FROM
    {{ ref('silver_dex__balancer_pools') }}

{% if is_incremental() and 'balancer' not in var('HEAL_MODELS') %}
WHERE
  _inserted_timestamp >= (
    SELECT
      MAX(_inserted_timestamp) - INTERVAL '{{ var("LOOKBACK", "4 hours") }}'
    FROM
      {{ this }}
  )
{% endif %}
),
...

Union all Protocol Specific CTEs together with SELECT * and UNION ALL

...
all_pools AS (
  SELECT
    *
  FROM
    dodo_v1
  UNION ALL
  SELECT
    *
  FROM
    dodo_v2
  UNION ALL
  SELECT
    *
  FROM
    frax
  UNION ALL
...

Create the complete CTE based on the unioned data, and apply transformation logic.
- Apply any and all transformation logic required to properly produce the values needed for the different protocols, using as much standardized logic as possible.
- This is where you will join the contracts CTE to include applicable token metadata values.
Add the heal_model CTE with the HEAL_MODEL var. For for information, please see the Incremental Heal Logic guide.
- For Liquidity Pools, we typically need to ensure that late-arriving decimals are properly handled.
Create the final SELECT statement.
- Add the id column using dbt_utils.generate_surrogate_key and include the column that represents one unique row.
- Add inserted_timestamp and modified_timestamp using SYSDATE() to represent the timestamp where new rows are inserted or modified in the model.
- Add the _invocation_id column, which outputs a UUID generated for each newly inserted row. For more information, please see DBT Docs: invocation_id.

  {{ dbt_utils.generate_surrogate_key(
    ['pool_address']
  ) }} AS complete_dex_liquidity_pools_id,
  SYSDATE() AS inserted_timestamp,
  SYSDATE() AS modified_timestamp,
  '{{ invocation_id }}' AS _invocation_id

Create the .yml file and add table/column level tests
- At minimum tests should include, but are not limited to, the following;
  - Uniqueness on 1+ columns (typically _id but may vary by model)
  - not_null tests on contract_address, pool_address and platform

Examples, References & Sources:

Gold Layer Modeling

Please reference the Gold Layer Modeling section in How to Research and Curate EVM Protocols for more details on the modeling process. The following information is related directly to Liquidity Pools only.

Summary:

This guide outlines the process of creating the Gold Layer DBT models for tracking DEX liquidity pool creation events or traces on EVM-compatible blockchains, along with the applicable table and column descriptions for transparency and documentation. The Gold model will select directly from the complete model, and will be surfaced to all users and clients.

Implementation Steps, Variables & Notes:

Set up the model configuration
- Use view materialization.
- Add persist_docs, with relation and columns set to `true.
- Add database_tags to the meta section in the config. These should include the names of every PROTOCOL (platform) in the model, along with one word values that define the PURPOSE (category) of the model.

{{ config(
    materialized = 'view',
    persist_docs ={ "relation": true,
    "columns": true },
    meta={
    'database_tags':{
        'table': {
            'PROTOCOL': 'BALANCER, CURVE, DODO, FRAXSWAP, KYBERSWAP, MAVERICK, PANCAKESWAP, SHIBASWAP, SUSHISWAP, UNISWAP, TRADER JOE, VERSE',
            'PURPOSE': 'DEX, LIQUIDITY, POOLS, LP, SWAPS',
            }
        }
    }
) }}

SELECT all relevant columns to pull through from complete_dex_liqudity_pools and surface to users in the Gold layer.
- Adjust the column names so that they are intuitive for a user analyzing what the table represents, which in this case, is a dimensional view on liquidity pool creations.
Create the .yml file and add table/column level descriptions.