dbt Tests - iff133/first GitHub Wiki

Testing

dbt models are only useful if they're correct
dbt makes it easy to define automated tests on your models
After writing these tests once, you can quickly run them when developing locally, or in production on an ongoing basis. By testing your models, you can build confidence in their correctness and avoid costly regressions down the line.
dbt provides two different mechanisms for data validation: schema tests and custom data tests

Schema Tests

Schema tests are assertions that a model's schema adheres to basic rules: referential integrity, uniqueness, and nullity, for instance
Modern data warehouses don't enforce these rules, but they're still incredibly useful tools for reasoning about and validating data in a database

Writing Schema Tests

Schema tests are declared in a schema.yml file.
dbt provides four types of schema tests out of the box. Custom schema tests can also be created.
TIP: If you find yourself writing repeating blocks of YAML for your tests, use anchor labels to stay DRY (https://stackoverflow.com/questions/8466223/reuse-a-block-of-code-in-yaml)

Built-in tests

Not null

This test validates that there are no null values present in a column

version: 2
models:
  - name: people
    columns:
      - name: id
        tests:
          - not_null
              
      - name: account_id
        tests:
          - not_null
              
      - name: status
        tests:
          - not_null

Unique

This test validates that there are no duplicate values present in a field

version: 2
models:
  - name: people
    columns:
      - name: id
        tests:
          - unique

Relationships

This test validates that all of the records in a child table have a corresponding record in a parent table. This property is referred to as "referential integrity"
The following example tests that every person's account_id maps back to a valid account

version: 2
models:
  - name: people
    columns:
      - name: account_id
        tests:
          - relationships:
              to: ref('accounts')
              field: id

The to argument accepts a Relation – this means you can pass it a ref to a model (e.g. ref('accounts')), or a source (e.g. source('payroll', 'accounts'))

Accepted values

This test validates that all of the values in a column are present in a supplied list of values. If any values other than those provided in the list are present, then the test will fail
The accepted_values test supports an optional quote parameter which by default will single-quote the list of accepted values in the test query. To test non-strings (like integers or boolean values) explicitly set the quote config to false

version: 2
models:
  - name: people
    columns:
      - name: status
        tests:
          - accepted_values:
              values: ['active', 'cancelled']
              
      - name: status_id
        tests:
          - accepted_values:
              values: [1, 2]
              quote: false

Putting it all together

The tests above were shown in separate files for clarity, but in practice, they should all be combined into a single file. This combined schema.yml file would look like this:

version: 2
models:
  - name: people
    columns:
      - name: id
        tests:
          - not_null
          - unique
      - name: account_id
        tests:
          - not_null:
              severity: warn
          - relationships:
              to: ref('accounts')
              field: id
              severity: error
            
      - name: status
        tests:
          - not_null
          - accepted_values:
              values: ['active', 'cancelled']
              severity: warn

Testing complicated logic

Testing Expressions

Sometimes, a constraint only holds for a combination of columns. For example, maybe the combination of user_id and date is unique, but neither column is unique in isolation
To test expressions like this, specify the tests directive at the model level, and use a column_name key to define the expression, as shown below.

version: 2
models:
  - name: people
    tests:
      - unique:
          column_name: "concat(date_day, user_id)"
      
      - not_null:
          column_name: "coalesce(status, new_status)"
      
      - accepted_values:
          column_name: "coalesce(status, new_status)"
          values: ['active', 'cancelled']
        
      - relationships:
          column_name: "coalesce(account_id, parent_account_id)"
          to: ref('accounts')
          field: id

It is recommended that users specify tests for as many constraints as can be reasonably identified in their database. This may result in a large number of total tests, but schema.yml makes it fast to create and modify these tests. The presence of tests like these can significantly increase the confidence in underlying data consistency in a database.

Custom data tests

Not all error conditions can be expressed in a schema test. For this reason, dbt provides a mechanism for testing arbitrary assertions about your data.
These data tests are sql SELECT statements that return 0 rows on success, or more than 0 rows on failure.
A typical data test might look like:

tests/assert_lt_5_pct_cookie_ids_are_null.sql
-- If >= 5% of cookie_ids are null, then the test returns 1 row (failure).
-- If < 5% of cookie_ids are null, then the test returns 0 rows (success)
with calc as (
    select
        case
            when cookie_id is null then 1
            else 0
        end as cookie_is_null
  
    from {{ ref('events') }}
  
),
agg as (
  
    select
        sum(cookie_is_null)::float / nullif(count(*), 0) as pct_null
    from calc
)
select *
from agg
where pct_null >= 0.05

Another example would be asserting that a particular column on a certain row is what you expect:

tests/assert_payment_has_expected_amount.sql
with data as (
  
  select *
  from {{ ref('payment_items') }}
  where payment_id = 777
  
),
validation as (
  select *
  from data
  where amount != 120
)
select * from validation

Custom data tests and model selection syntax To run custom data tests for a particular model, use the model selection syntax: $ dbt test --data --models my_model
Note that dependencies between custom data tests and models are defined by using the ref function in your data test, e.g. from {{ ref('my_model') }}
Enabling custom data tests: To enable data tests, add the test-paths config to your dbt_project.yml file:

dbt_project.yml
test-paths: ["tests"]        # look for *.sql files in the "tests" directory

Test Severity

The "severity" of a test can be configured by supplying the severity configuration option in the test specification. The severity option can be one of warn or error
If warn is supplied, then dbt will log a warning for any failing tests, but the test will still be considered passing. This configuration is useful for tests in which a failure does not imply that action is required
If a severity level is not provided, then tests are run with the error severity level. The severity config can be applied to any test type, including custom schema tests

Schema test severity

The following example exercises both warn and error level severities in a schema.yml file

models/schema.yml
version: 2
models:
  - name: people
    columns:
      - name: email
        tests:
          - not_null:
              severity: warn
          - unique:
              severity: error

Custom data test severity

The severity for a custom data test can be configured using the config function:

tests/my_test.sql
{{ config(severity='warn') }}
select *
from ....

Running Tests

Schema tests and custom data tests can be run with the dbt test command as shown below

dbt test           # run schema + data tests
dbt test --schema  # run only schema tests
dbt test --data    # run only data tests

Custom Schema Tests

https://discourse.getdbt.com/t/examples-of-custom-schema-tests/181

References

https://docs.getdbt.com/docs/building-a-dbt-project/testing-and-documentation/testing/
https://medium.com/@josh.temple/automated-testing-in-the-modern-data-warehouse-d5a251a866af