DBT - davidkhala/ETL GitHub Wiki

DBT

transform their data by simply writing select statements

DBT project

all a project needs is the dbt_project.yml project configuration file

DBT Resource

Resource Description
models Each model lives in a single file and contains logic that either transforms raw data into a dataset that is ready for analytics or, more often, is an intermediate step in such a transformation.
snapshots A way to capture the state of your mutable tables so you can refer to it later.
seeds CSV files with static data that you can load into your data platform with dbt.
tests SQL queries that you can write to test the models and resources in your project.
macros Blocks of code that you can reuse multiple times.
docs Docs for your project that you can build.
sources A way to name and describe the data loaded into your warehouse by your Extract and Load tools.
exposures A way to define and describe a downstream use of your project.
metrics A way for you to define metrics for your project.
analysis A way to organize analytical SQL queries in your project such as the general ledger from your QuickBooks.

dbt snapshot

DBT snapshot records changes to a mutable table over time, as a compensation in case

  • the model will have 2 new columns dbt_valid_from and dbt_valid_to

Blogs

Use case

  • source data systems are not built to store historical data
  • DBT snapshots are only useful if you run them frequently
  • You need a data enrichment in ouput model.
  • Type-2 Slowly Changing Dimension

dbt seed

Use case

When

  • Dimension table: A list of mappings of country codes to country names
  • A list of test emails to exclude from analysis
  • A list of employee account IDs

When not

  • Loading raw data that has been exported to CSVs
  • Any kind of production data containing sensitive information. For example personal identifiable information (PII) and passwords.

dbt test

DBT cloud

Ecosystem

Other refs