DBT - davidkhala/ETL GitHub Wiki
DBT
transform their data by simply writing select statements
DBT project
all a project needs is the dbt_project.yml
project configuration file
DBT Resource
Resource | Description |
---|---|
models | Each model lives in a single file and contains logic that either transforms raw data into a dataset that is ready for analytics or, more often, is an intermediate step in such a transformation. |
snapshots | A way to capture the state of your mutable tables so you can refer to it later. |
seeds | CSV files with static data that you can load into your data platform with dbt. |
tests | SQL queries that you can write to test the models and resources in your project. |
macros | Blocks of code that you can reuse multiple times. |
docs | Docs for your project that you can build. |
sources | A way to name and describe the data loaded into your warehouse by your Extract and Load tools. |
exposures | A way to define and describe a downstream use of your project. |
metrics | A way for you to define metrics for your project. |
analysis | A way to organize analytical SQL queries in your project such as the general ledger from your QuickBooks. |
dbt snapshot
DBT snapshot records changes to a mutable table over time, as a compensation in case
- the model will have 2 new columns
dbt_valid_from
anddbt_valid_to
Blogs
Use case
- source data systems are not built to store historical data
- DBT snapshots are only useful if you run them frequently
- You need a data enrichment in ouput model.
- Type-2 Slowly Changing Dimension
dbt seed
Use case
When
- Dimension table: A list of mappings of country codes to country names
- A list of test emails to exclude from analysis
- A list of employee account IDs
When not
- Loading raw data that has been exported to CSVs
- Any kind of production data containing sensitive information. For example personal identifiable information (PII) and passwords.