Reference Table - ScalefreeCOM/datavault4dbt GitHub Wiki

The purpose of a reference Table is to bring back together a refernce Hub with its surrounding reference Satellites. The end-result will be a table/view with all the reference keys and all descriptive attributes (selectable).

Data-history related, this can be done in three different ways, all supported by datavault4dbt:

  • latest data: Only the latest descriptive state per reference key(s) is loaded into a reference table. Minimum disk space (or computing power when using views) required.
  • fully historized: The entire history of the descriptive data per reference key(s) is loaded into the reference table. Depending on the change-frequency in the data, this can consume a lot of disk space (or computing power when using views).
  • snapshot driven: Similar to a PIT, a pre-defined set of snapshot dates is used to get the valid state of descriptive data for each snapshot. To use this historization method, a snapshot v1 view is required.
Parameters Data Type Required Default Value Explanation
ref_hub string mandatory - Name of the underlying reference Hub.
ref_satellites list | dictionary mandatory - Name(s) of all reference Satellites that are connected to the reference Hub and should be used for this reference Table. Either define as a list of satellites, or define as a dictionary. When defining asa dictionary, the parameters 'include' or 'exclude' can be used to select only a subset of columns for each satellite. When using 'include' only the specified columns will be selected from that satellite. When using 'exclude' all columns except the ones specified will be selected from the satellite. Both parameters can not be combined for a single satellite.
historized ['full', 'latest','snapshot'] optional 'latest' Controls how the data in the reference table should be historized. The three allowed values are 'full', 'latest' and 'snapshot'. For details what each means see above. When selecting 'snapshot', the additional parameter 'snapshot_relation' (see next table line) must be defined.
snapshot_relation (when choosing 'snapshot') string optional None Name of the dbt model for the snapshot v1 view. Must already be available in the dbt project
snapshot_trigger_column (when choosing 'snapshot') string optional datavault4dbt.snapshot_trigger_column Name of the trigger column inside the snapshot_relation.
src_ldts string optional datavault4dbt.ldts_alias Name of the ldts column inside the source models. Needs to use the same column name as defined as alias inside the staging model.
src_rsrc string optional datavault4dbt.rsrc_alias Name of the rsrc column inside the source models. Needs to use the same column name as defined as alias inside the staging model.

Example 1 (Latest Data)

{{ config(schema='core', materialized='view') }}

{%- set yaml_metadata -%}
ref_hub: 'nation_rh'
ref_satellites: 
    nation_rs1:
        include:
            - N_NAME
historized: 'latest'
{%- endset -%}

{% set metadata_dict = fromyaml(yaml_metadata) %}

{{ datavault4dbt.ref_table(ref_hub=metadata_dict.get('ref_hub'),
                    ref_satellites=metadata_dict.get('ref_satellites'),
                    historized=metadata_dict.get('historized'),
                    snapshot_relation=metadata_dict.get('snapshot_relation')) }}

Description

With this example, a reference table for the reference Hub 'nation_rh' is created. It will only hold the latest set of descriptive data per refernce key(s).
  • ref_hub:
    • nation_rh: This reference table will use the reference Hub for Nation as a base,
  • ref_satellites:
    • nation_rs: Only one satellite is used for this reference table. Since the parameter 'include' is specified, only the descriptive attribute 'N_NAME' will be loaded into the reference Table
  • historized:
    • latest: The reference table is configured to only hold the latest descriptive data per refernce key(s). Therefore it is not required to define 'snapshot_relation' and 'snapshot_trigger_column'.

Compiled SQL

Click Me!
WITH 

dates AS (

SELECT MAX(ldts) as ldts FROM (SELECT distinct 
        ldts
    FROM datavault4dbt_demo.core_Core.nation_rs1
    WHERE ldts != TO_TIMESTAMP('8888-12-31T23:59:59', 'YYYY-MM-DDTHH24:MI:SS')
    
    )


),

ref_table AS (

    SELECT
    
        h.N_NATIONKEY,
        ld.ldts,
        h.rsrc,
        s_1.N_NAME 

    FROM datavault4dbt_demo.core_Core.nation_rh h
    
    FULL OUTER JOIN dates ld
        ON 1 = 1  

    LEFT JOIN datavault4dbt_demo.core_Core.nation_rs1 s_1
        ON h.N_NATIONKEY = s_1.N_NATIONKEY
        AND  ld.ldts BETWEEN s_1.ldts AND s_1.ledts
    
    

    WHERE h.ldts <= ld.ldts

) 

SELECT * FROM ref_table

Example 2 (Fully Historized)

{{ config(schema='core', materialized='view') }}

{%- set yaml_metadata -%}
ref_hub: 'nation_rh'
ref_satellites: 
    nation_rs1:
        exclude:
            - N_NAME
historized: 'full'
{%- endset -%}

{% set metadata_dict = fromyaml(yaml_metadata) %}

{{ datavault4dbt.ref_table(ref_hub=metadata_dict.get('ref_hub'),
                    ref_satellites=metadata_dict.get('ref_satellites'),
                    historized=metadata_dict.get('historized'),
                    snapshot_relation=metadata_dict.get('snapshot_relation')) }}

Description

With this example, a reference table for the reference Hub 'nation_rh' is created. It will hold all states of descriptive data.
  • ref_hub:
    • nation_rh: This reference table will use the reference Hub for Nation as a base,
  • ref_satellites:
    • nation_rs: Only one satellite is used for this reference table. Since the parameter 'exclude' is defined, only the descriptive columns of the satellite, that do not match the name 'N_NAME' will be loaded into the reference table.
  • historized:
    • latest: The reference table is configured hold all the states of descriptive data. Therefore it is not required to define 'snapshot_relation' and 'snapshot_trigger_column'.

Compiled SQL

Click Me!
WITH 

dates AS (

SELECT distinct ldts FROM (SELECT distinct 
        ldts
    FROM datavault4dbt_demo.core_Core.nation_rs1
    WHERE ldts != TO_TIMESTAMP('8888-12-31T23:59:59', 'YYYY-MM-DDTHH24:MI:SS')
    
    )


),

ref_table AS (

    SELECT
    
        h.N_NATIONKEY,
        ld.ldts,
        h.rsrc,
        s_1.HD_NATION_RS,
        s_1.N_COMMENT,
        s_1.N_REGIONKEY 

    FROM datavault4dbt_demo.core_Core.nation_rh h
    
    FULL OUTER JOIN dates ld
        ON 1 = 1  

    LEFT JOIN datavault4dbt_demo.core_Core.nation_rs1 s_1
        ON h.N_NATIONKEY = s_1.N_NATIONKEY
        AND  ld.ldts BETWEEN s_1.ldts AND s_1.ledts
    
    

    WHERE h.ldts <= ld.ldts

) 

SELECT * FROM ref_table    

Example 3 (Snapshot Based)

{{ config(schema='core', materialized='incremental') }}

{%- set yaml_metadata -%}
ref_hub: 'nation_rh'
ref_satellites: 
    - nation_rs1
historized: 'snapshot'
snapshot_relation: 'snap_v1'
{%- endset -%}

{% set metadata_dict = fromyaml(yaml_metadata) %}

{{ datavault4dbt.ref_table(ref_hub=metadata_dict.get('ref_hub'),
                    ref_satellites=metadata_dict.get('ref_satellites'),
                    historized=metadata_dict.get('historized'),
                    snapshot_relation=metadata_dict.get('snapshot_relation')) }}

Description

With this example, a snaphot-based reference table for the reference hub "nation_rh" is created.
  • ref_hub:
    • nation_rh: This reference table will use the reference Hub for Nation as a base,
  • ref_satellites:
    • nation_rs: Only one satellite is used for this reference table. Since the parameters "include" or "exclude" are not defined for this satellite, all descriptive columns of this satellites will end up in the reference table.
  • historized:
    • snapshot: The reference table is configured to be snapshot based historized. That requires a snapshot relation, which is set next.
  • snapshot_relation:
    • snap_v1: The dbt model that holds the Snapshot v1 View is called 'snap_v1'. This must be set when historization is set to 'snapshot. The snapshot model must already exists within the dbt project.
  • snapshot_trigger_column:
    • not set: Since this parameter is not set, the global variable "datavault4dbt.snapshot_trigger_column" will be used for activating and deactivating specific snapshots within the snapshot relation.

Compiled SQL

Click Me!
WITH 

dates AS (

SELECT 
        sdts
    FROM (
        
        SELECT 
            sdts
        FROM datavault4dbt_demo.core_Control.snap_v1
        WHERE is_active
    )),

ref_table AS (

    SELECT
    
        h.N_NATIONKEY,
        ld.sdts,
        h.rsrc,
        s_1.HD_NATION_RS,
        s_1.N_COMMENT,
        s_1.N_NAME,
        s_1.N_REGIONKEY 

    FROM datavault4dbt_demo.core_Core.nation_rh h
    
    FULL OUTER JOIN dates ld
        ON 1 = 1  

    LEFT JOIN datavault4dbt_demo.core_Core.nation_rs1 s_1
        ON h.N_NATIONKEY = s_1.N_NATIONKEY
        AND  ld.sdts BETWEEN s_1.ldts AND s_1.ledts
    
    

    WHERE h.ldts <= ld.sdts

) 

SELECT * FROM ref_table
⚠️ **GitHub.com Fallback** ⚠️