Prejoining - ScalefreeCOM/datavault4dbt GitHub Wiki

Defines information about information that needs to be prejoined. Most commonly used to create links, when the source data does not hold the Business Key, but the technical key of the referred object. The values of the dict are the aliases you want to give the prejoined columns. Typically, but not always, this should be the same as the name of the prejoined column inside the prejoined entity. For each prejoined column a few things need to be defined inside another dictionary now. 'src_name' holding the name of the source of the prejoined entity, as defined in the .yml file. 'src_table' holds the name of the prejoined table, as defined inside the .yml file. 'bk' Holds the name of the business key column inside the prejoined table. 'this_column_name' holds the name of the column inside the original source data, that refers to the prejoined table. 'ref_column_name' holds the name of the column, that is refered by 'this_column_name' inside the prejoined table.

Prejoining is used to enrich source data by attributes from other database objects. In general, it should only be used when the source data does not hold the Business Key, but the technical Key of an object.

Configuring Prejoins

Within one Stage model, users can setup extraction of one to many columns from other database objects. Per column, one key-value pair of a dictionary needs to be defined:

prejoined_columns:
    <col_alias_1>:
        ref_model: <name_of_other_dbt_model>
        bk: <name_of_column_to_be_selected>
        this_column_name: <name_of_col_in_this_object>
        ref_column_name: <name_of_col_in_ref_object>
    <col_alias_2>:
        src_name: <name_of_dbt_source>
        src_table: <name_of_table_within_dbt_source>
        bk: <name_of_column_to_be_selected>
        this_column_name: 
            - <name_of_col_1_in_this_object>
            - <name_of_col_2_in_this_object>
        ref_column_name: 
            - <name_of_col_1_in_ref_object>
            - <name_of_col_2_in_ref_object>

This configuration will roughly translate to the following SQL part:

SELECT 
    src.*,
    pj_1.<name_of_column_to_be_selected> AS <col_alias_1>,
    pj_2.<name_of_column_to_be_selected> AS <col_alias_2>
FROM  {{ this }} src
LEFT JOIN {{ ref(<name_of_other_dbt_model>) }} pj_1
    ON src.<name_of_col_in_this_object> = pj_1.<name_of_col_in_ref_object>
LEFT JOIN {{ source(<name_of_dbt_source>, <name_of_table_within_dbt_source>) }} pj_2
    ON src.<name_of_col_1_in_this_object> = pj_2.<name_of_col_1_in_ref_object>
    AND src.<name_of_col_2_in_this_object> = pj_2.<name_of_col_2_in_ref_object>

Example of a definition of prejoined_columns parameter in an example stage:

prejoined_columns:
    businessid:
        ref_model: 'business_raw'
        bk: 'ID'
        this_column_name: 'ContractId'
        ref_column_name: 'ContractId'
    contractnumber:
        src_name: 'source_data'
        src_table: 'contract'
        bk: 'contractnumber'
        this_column_name: 'ContractId'
        ref_column_name: 'Id'
    master_account_key:
        src_name: 'source_data'
        src_table: 'account'
        bk: 'account_key'
        this_column_name: 'master_account_id'
        ref_column_name: 'Id'

The configuration includes the following parameters per prejoined column:

Parameters Data Type Explanation
ref_model string Name of the other dbt model that should be referred. Either this, or the parameters 'src_name' and 'src_table' must be defined.
src_name string If a dbt source should be prejoined, use this parameter to set the name of the source, as defined in the sources section of a yml file. Must be used together with 'src_table', and instead of 'ref_model'.
src_table string If a dbt source should be prejoined, use this parameter to set the table of the source 'src_name', as defined in the sources section of a yml file. Must be used together with 'src_name', and instead of 'ref_model'.
bk string Name of the column inside the referred object that should be extracted.
this_column_name string / list Specifies one or more columns within the source model of this stage, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in 'ref_column_name'.
ref_column_name string / list Specifies one or more columns within the referenced object of this prejoin, that should be used as the JOIN condition of the Prejoin. Can be multiple columns, but must match the number of columns defined in 'this_column_name'.
(optional) operator string Only used when multiple columns are defined for 'this_col_name' and 'ref_col_name'. Influences which logical operator is used to combine multiple JOIN conditions. Default is 'AND', use only if other operator is desired.
⚠️ **GitHub.com Fallback** ⚠️