General: DeFi: Dex Swaps - FlipsideCrypto/fsc-evm GitHub Wiki

Building and Deploying DEX Swaps Models

Decentralized Exchanges or DEXs, are permissionless marketplaces where tokens can be bought or sold, without a centralized intermediary. For EVMs, activity occurs on Liquidity Pools, which are smart contracts comprised of token pairs that enable automated market making and token swaps. This guide will detail how to 1) Research DEX protocols and 2) Build the Silver/Gold DBT model layers for DEX Swaps.

Research Phase

Please reference the Research Phase section in How to Research and Curate EVM Protocols for official documentation on the research process. The following information will be relevant to DEX Swaps.

Summary:

In order to properly build a Swap model for a DEX protocol, you'll need to identify the Liquidity Pools for the protocol. Please see the DEX Liquidity Pools guide for more information on researching and building those models.

Best Practices, Tips & Tricks:

Swap events generally occur directly on the liquidity pool address/contract.
- In cases where a swap does not occur on a liquidity pool, we may be able to parse swap events via the router contract for the protocol.
The majority of DEX Swap protocols are based on Uniswap deployments and share the same contract structure, emitted events and event topics. For this reason, we may use other DEX Swaps models as a starting point for researching new protocols.
Depending on the protocol, the token/amount in/out may be labeled within the emitted events as 0 or 1. While 0 typically indicates in and 1 out, there may be other logic that is required to determine the proper swap flow. Please see ethereum.silver_dex.fraxswap_swaps for an example (e.g. DEX protocols based on the uniswap v2 deployment).
- While swap events are similarly structured between the majority of decentralized exchanges, certain protocols may refer to token and amount in/out differently (e.g. token0 vs tokenA vs fromToken).
In rare cases, certain protocols (such as Curve) may not have the required values emitted in events. You may need to reference transfers tables or traces to successfully parse all relevant columns that make up a swap.

Silver Layer Modeling

This guide outlines the process of creating the Silver Layer DBT models for tracking DEX Swap events or traces on EVM-compatible blockchains, along with the applicable DBT tests for maintaining data integrity. This includes both Protocol Specific and Curated Complete models.

Please reference the Silver Layer Modeling section in How to Research and Curate EVM Protocols for more details on the modeling process. The following information is related directly to DEX Swaps only.

1. Protocol Specific Models

Summary:

Built with silver.logs and/or silver.traces, the Silver models are designed to capture essential information about each and every token swap that occurs on the DEXs liquidity pools, including token and amount in/out, sender address, recipient address, and other relevant values. The token_in, token_out, amount_in, and amount_out are the most important columns to extract, as these will be crucial for analyzing each swap.

Best Practices, Tips & Tricks:

The DEX models are some of the first curated models deployed by Flipside, and therefore may have non-standardized structures or naming conventions. Over time, these models may be adapted to new standards.
Some protocols may have different deployed versions or topic_0 / events for swaps on same or different contract addresses. If the structure of the event is different, we typically deploy multiple models for each version (e.g. silver_dex.kyberswap_v1_dynamic_swaps, silver_dex.kyberswap_v1_static_swaps and silver_dex.kyberswap_v2_elastic_swaps.

Implementation Steps, Variables & Notes:

Set up the model configuration
- Use incremental materialization with a delete+insert strategy.
- Define a unique key to identify each record that the incremental strategy should apply to. The unique key defined in the config will be different than the unique column / unique row in the table. This is because the model must delete+insert on a broader key to properly handle block reorg scenarios. Please see the Block Reorg guide for more information. We typically use block_number in swap models because it allows us to reload blocks in their entirety, rather than reloading by transaction or event and potentially missing, loading or deleting incorrect records.
- Cluster by columns with low cardinality, such as block_timestamp::DATE, to improve query performance.
- Add relevant tags for model categorization (typically tagged as curated and reorg) so that the model may be picked up by the relevant scheduled workflows.

{{ config(
    materialized = 'incremental',
    incremental_strategy = 'delete+insert',
    unique_key = "block_number",
    cluster_by = ['block_timestamp::DATE'],
    tags = ['curated','reorg']
) }}

Create a CTE for liquidity pools
- Each swap typically occurs on a liquidity pool contract, so we pull these through without incremental logic to inner join in subsequent CTEs.
- While creating a separate CTE for pools is technically unnecessary, having it stand alone generally improves model readability.

WITH pools AS (

    SELECT
        pool_address,
        token0,
        token1
    FROM
        {{ ref('silver_dex__fraxswap_pools') }}
),

Create a CTE for Swap events
- Manually parse event data to extract relevant information (token and amount in/out, sender address, recipient address, etc.).
  - For incremental runs, filter data based on the latest _inserted_timestamp, with a 12 hour lookback.
Add the final SELECT statement
- Select all relevant fields from the Swaps events CTE.
- If necessary, add WHERE token_in <> token_out to handle potential situations where a token is swapped for the same token, which may incorrectly lead to an overstatement of voluming or inconsistent reporting of true swap activity.
- Adjust the token and amount 0/A/From or 1/B/To as needed to get the final token and amount in/out column values.
- Manually add Swap as event_name and the protocol name as the platform, where necessary.
Create the .yml file and add table/column level tests
- At minimum tests should include, but are not limited to, the following;
  - Uniqueness on 1+ columns (typically _log_id but may vary by model)
  - not_null tests on token_in, token_out, sender and tx_to

version: 2
models:
  - name: silver_dex__fraxswap_swaps
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - _LOG_ID
    columns:
      - name: BLOCK_NUMBER
      ...

Examples, References & Sources:

2. Complete Models

Summary:

The complete_dex_swaps model aggregates all protocol specific DEX swaps models together into one, allowing centralized joins on silver.contracts and ez_prices_hourly for token metadata, hourly price data, streamlined logic and standardized columns. This includes platform names, versions, USD amounts, symbols, decimals and other relevant columns for all swap events that occur on DEX liquidity pools.

Best Practices, Tips & Tricks:

Protocol Versions: If there are multiple versions of a protocol, coming from separate models, each should have it's own CTE.
- The name for the platform column should include both the protocol and version separated by a -(dash). You do not need to include, v1, but instead begin incrementing at v2 and beyond.
- The version column should represent a version specific to the CTE/model being referenced. For example, if there are multiple CTEs/models required for Pancakeswap v2, you would add pancakeswap-v2 as the platform and v2-amm as the version in the first CTE, then pancakeswap-v2 as the platform and v2-mm as the version in the next CTE. This is required because the delete+insert incremental strategy uses both platform and version to identify new incremental records.
When adding new Protocols/CTEs/models to the complete model, please pay attention to the number of columns being pulled through, the order of those columns and the data type/format and names of those columns, as this will materially impact the output of the model.
- More specifically, pay attention to how pool_name is handled in the complete_dex_liquidity_pools model that gets joined in here or if pool_name is NULL/missing entirely. Unique logic is typically required for DEXs that are not built off liquidity pool models and instead use events emitted through swap router contracts.

Implementation Steps, Variables & Notes:

Set up the model configuration
- Use incremental materialization with a delete+insert strategy.
- Define a unique key to identify each record that the incremental strategy should apply to. The unique key defined in the config will be different than the unique column / unique row in the table. This is because the model must delete+insert on a broader key to properly handle block reorg scenarios. Please see the Block Reorg guide for more information. We typically use ['block_number','platform','version'] in the complete swaps model because it allows us to reload blocks, platforms and versions in their entirety, rather than reloading by transaction or event and potentially missing, loading or deleting incorrect records.
- Cluster by columns with low cardinality, such as block_timestamp::DATE and platform, to improve query performance.
- Add Search Optimization (SO) in the post_hook with EQUALITY and SUBSTRING. Please only add SO on non-number columns, where applicable.
- Add relevant tags for model categorization (typically tagged as ['curated','reorg','heal']) so that the model may be picked up by the relevant scheduled workflows.

{{ config(
  materialized = 'incremental',
  incremental_strategy = 'delete+insert',
  unique_key = ['block_number','platform','version'],
  cluster_by = ['block_timestamp::DATE','platform'],
  post_hook = "ALTER TABLE {{ this }} ADD SEARCH OPTIMIZATION ON EQUALITY(tx_hash, origin_function_signature, origin_from_address, origin_to_address, contract_address, pool_name, event_name, sender, tx_to, token_in, token_out, symbol_in, symbol_out), SUBSTRING(origin_function_signature, pool_name, event_name, sender, tx_to, token_in, token_out, symbol_in, symbol_out)",
  tags = ['curated','reorg','heal']
) }}

Create CTEs for each Protocol Specific model/version, where applicable.
- Select all columns from the source table
- Use standardized column names, as each of these CTEs will be unioned together towards the end of the model.
  - Add columns with NULL values in cases where the column exists in other Protocol Specific CTEs.
    - Columns for tokens and amounts may require additional minor transformations, specifically in the case of uniswap v3 type DEXs.
- Hardcode the platform and version, if not done already.
- For incremental runs, filter data based on the latest _inserted_timestamp, with the LOOKBACK variable, defaulted to 4 hours.
  - The name used to reference `var('HEAL_MODELS') should match the name of the CTE.

...
verse AS (
  SELECT
    block_number,
    block_timestamp,
    tx_hash,
    origin_function_signature,
    origin_from_address,
    origin_to_address,
    contract_address,
    event_name,
    amount_in_unadj,
    amount_out_unadj,
    token_in,
    token_out,
    sender,
    tx_to,
    event_index,
    platform,
    'v1' AS version,
    _log_id,
    _inserted_timestamp
  FROM
    {{ ref('silver_dex__verse_swaps') }}

{% if is_incremental() and 'verse' not in var('HEAL_MODELS') %}
WHERE
  _inserted_timestamp >= (
    SELECT
      MAX(_inserted_timestamp) - INTERVAL '{{ var("LOOKBACK", "4 hours") }}'
    FROM
      {{ this }}
  )
{% endif %}
),
univ3 AS (
  SELECT
    block_number,
    block_timestamp,
    tx_hash,
    origin_function_signature,
    origin_from_address,
    origin_to_address,
    pool_address AS contract_address,
    'Swap' AS event_name,
    CASE
      WHEN amount0_unadj > 0 THEN ABS(amount0_unadj)
      ELSE ABS(amount1_unadj)
    END AS amount_in_unadj,
    CASE
      WHEN amount0_unadj < 0 THEN ABS(amount0_unadj)
      ELSE ABS(amount1_unadj)
    END AS amount_out_unadj,
    CASE
      WHEN amount0_unadj > 0 THEN token0_address
      ELSE token1_address
    END AS token_in,
    CASE
      WHEN amount0_unadj < 0 THEN token0_address
      ELSE token1_address
    END AS token_out,
    sender,
    recipient AS tx_to,
    event_index,
    'uniswap-v3' AS platform,
    'v3' AS version,
    _log_id,
    _inserted_timestamp
  FROM
    {{ ref('silver__univ3_swaps') }}

{% if is_incremental() and 'univ3' not in var('HEAL_MODELS') %}
WHERE
  _inserted_timestamp >= (
    SELECT
      MAX(_inserted_timestamp) - INTERVAL '{{ var("LOOKBACK", "4 hours") }}'
    FROM
      {{ this }}
  )
{% endif %}
),
...

Union all Protocol Specific CTEs together with SELECT * and UNION ALL

...
all_dex AS (
...
  UNION ALL
  SELECT
    *
  FROM
    dodo_v1
  UNION ALL
  SELECT
    *
  FROM
    dodo_v2
  UNION ALL
  SELECT
    *
  FROM
    fraxswap
  UNION ALL
...

Create the complete CTE based on the unioned data, and apply transformation logic.
- Apply any and all transformation logic required to properly produce the values needed for the different protocols, using as much standardized logic as possible.
- This is where you will join silver.contracts and price.ez_prices_hourly to include applicable USD swap pricing and token metadata values.
Add the heal_model CTE with the HEAL_MODEL var. For for information, please see the Incremental Heal Logic guide.
- For DEX Swaps, we typically need to ensure that late-arriving decimals and USD amounts are properly handled.
Create the final SELECT statement.
- Add the id column using dbt_utils.generate_surrogate_key and include the column that represents one unique row.
- Add inserted_timestamp and modified_timestamp using SYSDATE() to represent the timestamp where new rows are inserted or modified in the model.
- Add the _invocation_id column, which outputs a UUID generated for each newly inserted row. For more information, please see DBT Docs: invocation_id.

  {{ dbt_utils.generate_surrogate_key(
    ['tx_hash','event_index']
  ) }} AS complete_dex_swaps_id,
  SYSDATE() AS inserted_timestamp,
  SYSDATE() AS modified_timestamp,
  '{{ invocation_id }}' AS _invocation_id

Create the .yml file and add table/column level tests
- At minimum tests should include, but are not limited to, the following;
  - Uniqueness on 1+ columns (typically _log_id but may vary by model)
  - not_null tests on token_in, token_out, sender and tx_to

Examples, References & Sources:

Gold Layer Modeling

Please reference the Gold Layer Modeling section in How to Research and Curate EVM Protocols for more details on the modeling process. The following information is related directly to DEX Swaps only.

Summary:

This guide outlines the process of creating the Gold Layer DBT models for tracking DEX Swaps events, transfers or traces on EVM-compatible blockchains, along with the applicable table and column descriptions for transparency and documentation. The Gold model will select directly from the complete model, and will be surfaced to all users and clients.

Implementation Steps, Variables & Notes:

Set up the model configuration
- Use view materialization.
- Add persist_docs, with relation and columns set to `true.
- Add database_tags to the meta section in the config. These should include the names of every PROTOCOL (platform) in the model, along with one word values that define the PURPOSE (category) of the model.

{{ config(
    materialized = 'view',
    persist_docs ={ "relation": true,
    "columns": true },
    meta ={ 'database_tags':{ 'table':{ 'PROTOCOL': 'SUSHI, UNISWAP, CURVE, SYNTHETIX, BALANCER, DODO, FRAX, HASHFLOW, KYBERSWAP, MAVERICK, PANCAKESWAP, SHIBASWAP, TRADER JOE, VERSE',
    'PURPOSE': 'DEX, SWAPS' } } }
) }}

SELECT all relevant columns to pull through from complete_dex_swaps and surface to users in the Gold layer.
- Adjust the column names so that they are intuitive for a user analyzing what the table represents, which in this case, is an EZ view on DEX Swap events and related USD values.
- Apply custom logic to NULL USD amounts in cases where the USD value on one side of the swap is NULL (e.g. missing prices) or the variance between the USD swap values is greater than 75% (e.g. low-liquidity swap). This does not apply to swaps on the wrapped version of the chain's NATIVE token.
  - We do this to manage low liquidity swaps and tokens where prices may be extremely volatile, so that we don’t overstate volumes
  - We exclude native tokens (WETH, WBNB, WAVAX etc.) from being NULL in cases where one of the swap doesn’t have a price, as the Wrapped Native token has more reliable pricing.
- Apply the ROUND function, and round to the nearest 2 decimals to emulate USD denominations.
Create the .yml file and add table/column level descriptions.