PIT - ScalefreeCOM/datavault4dbt GitHub Wiki

This macro creates a PIT table to gather snapshot based information of one hub and its surrounding satellites. For this macro to work, a snapshot table is required, that has a trigger column to identify which snapshots to include in the PIT table. The easiest way to create such a snapshot table is to use the control_snap macros provided by this package.

Features:

  • Tracks the active satellite entries for each entry in a Hub for each snapshot
  • Strongly improves performance if upstream queries requires many JOIN operations
  • Creates a unique dimension key to optimize loading performance of incremental loads
  • Allows to insert a static string as record source column, matching business vault definition of a record source
Parameters Data Type Required Default Value Explanation
pit_type string optional None String to insert into the 'pit_type' column. Has to be prefixed by "!". Allows for future implementations of other PIT variants, like T-PITs etc. Can be set freely, something like 'PIT' could be the default.
tracked_entity string mandatory - Name of the tracked Hub entity. Must be available as a model inside the dbt project.
hashkey string mandatory - The name of the hashkey column inside the previously refered Hub entity.
sat_names list of strings mandatory - A list of all the satellites that should be included in this PIT table. Can only be satellites that are attached to the tracked Hub, and should typically include all those satellites. You should always refer here to the version 1 satellites, since those hold the load-end-date. The macro currently supports regular satellites and nh-satellites.
snapshot_relation string mandatory - The name of the snapshot relation. It needs to be available as a model inside this dbt project.
snapshot_trigger_column string important None The name of the column inside the previously mentioned snapshot relation, that is boolean and identifies the snapshots that should be included in the PIT table.
dimension_key string mandatory - The desired name of the dimension key inside the PIT table. Should follow some naming conventions. Recommended is the name of the hashkey with a '_d' suffix.
ldts string optional datavault4dbt.ldts_alias Name of the ldts column inside all source models. Needs to use the same column name as defined as alias inside the staging model.
custom_rsrc string optional None A custom string that should be inserted into the 'rsrc' column inside the PIT table. Since a PIT table is a business vault entity, the technical record source is no longer used here.
ledts string optional datavault4dbt.ledts_alias Name of the load-end-date column inside the satellites.
sdts string optional datavault4dbt.sdts_alias Name of the snapshot date timestamp column inside the snapshot table. set here.

Example 1

{{ config(materialized='incremental',
        post_hook="{{ datavault4dbt.clean_up_pit('control_snap_v1') }}") }}

{%- set yaml_metadata -%}
pit_type: '!Regular PIT'
tracked_entity: 'account_h'
hashkey: 'hk_account_h'
sat_names:
    - account_lroc_p_s
    - account_lroc_n_s
    - account_hroc_p_s
    - account_hroc_n_s
snapshot_relation: 'control_snap_v1'
snapshot_trigger_column: 'is_active'
dimension_key: 'hk_account_d'
custom_rsrc: 'PIT table for SAP/Accounts. For more information see our Wiki!'
{%- endset -%}    

{%- set metadata_dict = fromyaml(yaml_metadata) -%}

{{ datavault4dbt.pit(pit_type=metadata_dict.get('pit_type'),
                                tracked_entity=metadata_dict.get('tracked_entity'),
                                hashkey=metadata_dict.get('hashkey'),
                                sat_names=metadata_dict.get('sat_names'),
                                snapshot_relation=metadata_dict.get('snapshot_relation'),
                                snapshot_trigger_column=metadata_dict.get('snapshot_trigger_column'),
                                dimension_key=metadata_dict.get('dimension_key'),
                                custom_rsrc=metadata_dict.get('custom_rsrc')) }}

Description

With this example, a PIT Table is created. In line three of this example, the post hook "clean_up_pit" is used. For further information about the hook, click on the following link: Hook Clean Up PITs
  • pit_type:
    • !Regular PIT: PIT type is set to 'Regular PIT'. Optional.
  • tracked_entity:
    • account_h: This PIT table tracks the Hub Account.
  • hashkey:
    • hk_account_h: The name of the hashkey column ('hk_account_h') inside the previously refered Hub entity ('account_h').
  • sat_names:
    • ['account_lroc_p_s','account_lroc_n_s','account_hroc_p_s','account_hroc_n_s']: This four satellites are inclueded in the PIT table.
  • snapshot_relation:
    • control_snap_v1: The name of the snapshot relation.
  • snapshot_trigger_column:
    • is_active: The name of the column inside the previously mentioned snapshot relation that is boolean and identifies the snapshots that should be included in the PIT table.
  • dimension_key:
    • hk_account_d: The desired name of the dimension key inside the PIT table.
  • custom_rsrc:
    • PIT table for SAP/Accounts.: A custom string that should be inserted into the 'rsrc' column inside the PIT table. Optional.
⚠️ **GitHub.com Fallback** ⚠️