Non Historized Satellite - ScalefreeCOM/datavault4dbt GitHub Wiki
This macro creates a non-historized satellite that should be materialized as an incremental table. It should be applied 'on top' of the staging layer, and is either connected to a Hub or a Link. Besides the missing hashdiff, a non-historized satellite applies the same loading logic as a regular version 0 satellite. Each satellite can only be loaded by one source model, since we typically recommend a satellite split by source system.
Features:
- High-Perfomance loading of non-historized satellite data
Parameters | Data Type | Required | Default Value | Explanation |
---|---|---|---|---|
parent_hashkey | string | mandatory | - | Name of the hashkey column inside the stage of the object that this satellite is attached to. |
src_payload | string | list of strings | mandatory | - | A list of all the descriptive attributes that should be included in this satellite. |
source_model | string | mandatory | - | Name of the underlying staging model, must be available inside dbt as a model. |
source_is_single_batch | boolean | optional | False | See below for explanation. |
src_ldts | string | optional | datavault4dbt.ldts_alias | Name of the ldts column inside the source models. Needs to use the same column name as defined as alias inside the staging model. |
src_rsrc | string | optional | datavault4dbt.rsrc_alias | Name of the rsrc column inside the source models. Needs to use the same column name as defined as alias inside the staging model. |
{{ config(materialized='incremental') }}
{%- set yaml_metadata -%}
parent_hashkey: 'hk_creditcard_transactions_nl'
src_payload:
- invoice_address
- vendor_name
source_model: 'stage_creditcard_transactions'
{%- endset -%}
{%- set metadata_dict = fromyaml(yaml_metadata) -%}
{{ datavault4dbt.nh_sat(parent_hashkey=metadata_dict.get('parent_hashkey'),
src_payload=metadata_dict.get('src_payload'),
source_model=metadata_dict.get('source_model'),
src_ldts=metadata_dict.get('src_ldts'),
src_rsrc=metadata_dict.get('src_rsrc')) }}
-
parent_hashkey:
- hk_creditcard_transactions_nl: The satellite would be attached to the non-historized link creditcard_transaction, which has the column 'hk_creditcard_transactions_nl' as a hashkey column.
-
src_payload:
- ['invoice_address','vendor_name']: This satellite would hold the columns 'invoice_address' and 'vendor_name' out of the underlying staging area.
-
source_model:
- stage_creditcard_transactions: This satellite is loaded out of the stage for creditcard_transactions
To safely disable the deduplication, you have to verify that the underlying staging model of the NH Sat only holds one row per Link Hashkey.
We highly recommend setting up a uniqueness test within a yml file, covering the hashkey column of the staging model.
Once this is guaranteed, you just add the following parameter to your macro call:
source_is_single_batch=true
If this is set to true, this QUALIFY statement:
earliest_hk_over_all_sources AS (
{# Deduplicate the unionized records again to only insert the earliest one. #}
SELECT
lcte.*
FROM {{ ns.last_cte }} AS lcte
QUALIFY ROW_NUMBER() OVER (PARTITION BY {{ link_hashkey }} ORDER BY {{ src_ldts }}) = 1
{%- set ns.last_cte = 'earliest_hk_over_all_sources' -%}
),
will not be activated!
To contiously ensure data validity within your NH Sat, we also recommend setting up a uniquess test across the Link Hashkey on your NH Sat.