How to Create a Hub and Satellite Quickly and Efficiently - OptimalBI/optimal-data-engine-mssql GitHub Wiki

The following is a sample stored procedure, which will do all the necessary work to create a Hub and Satellite combination with minimal effort and fiddling with Configuration.

In addition, the procedure can be set to add a Satellite to an existing Hub.

The stored procedure requires a Stage Table to exist, which it uses to obtain the necessary metadata for creating all Configuration details.

The script can be found at GitHub. It is a part of the ODE Admin project which is supposed to be installed as an extra database on the same server where ODE Config is located. If you don't want to install the project, replace $(ConfigDatabase) in code with the ODE Config database name before execution.

It expects a Schedule and a suitable Source System to exist in your Configuration. All the preparation steps are described here. This procedure will add source table to the schedule. If the source table has dependencies on other Data Vault tables to be executed first, hierarchy should be added manually.

Further, it will create a Release for the Satellite (and if there is one, the Hub), to simplify releasing the newly created Configuration to Test / Production.

Procedure outputs every step of the execution. It includes hub and satellite keys, list of columns, schedule record etc. These may be required in case of further modifications to configured objects. Otherwise most of the output could be ignored.

Note that the script will create all the necessary Configuration, but it doesn’t create the Hub and Satellite table(s) in the Vault.

The last part of the script produces a set of useful scripts which can be used to:

  • Create the Tables in the Data Vault
  • Populate the tables by manual one-off load (Note that you can't execute this script in case if you use SSISPackage as a source type, see below)
  • Select from the tables to check that they have been created and populated as expected.

Code example:

USE [ODE_Admin]
GO

DECLARE @HubKeyNames [dbo].[dv_column_list]
INSERT INTO @HubKeyNames VALUES
('CustomerID')
DECLARE @DerivedColumnsList [dbo].[dv_column_matching_list]

EXECUTE [dbo].[ODE_hub_sat_config] 
   @SatelliteOnly = 'N'
  ,@SprintDate = 20180329
  ,@ReleaseReference = 'US-304'
  ,@ReleaseSource = 'Jira'
  ,@StageDatabase = 'ODE_Stage'
  ,@StageSchema = 'Stage'
  ,@StageTable = 'Sales_CustomerAddress'
  ,@StageSourceType = 'BespokeProc'
  ,@StageLoadType = 'Full'
  ,@StagePassLoadTypeToProc = 0
  ,@SourceDataFilter = NULL
  ,@HubName = 'Customer'
  ,@SatelliteName = 'Sales_CustomerAddress'
  ,@VaultName = 'ODE_Vault'
  ,@FullScheduleName = 'Full_load'
  ,@IncrementScheduleName = NULL
  ,@HubKeyNames = @HubKeyNames
  ,@SourceSystemName = 'AdvWorks'
  ,@SouceTableSchema = 'Sales'
  ,@SourceTableName = 'CustomerAddress'
  ,@SSISPackageName = NULL
  ,@DerivedColumnsList = @DerivedColumnsList
GO

Provide the following parameters to the procedure execution:

  • SatelliteOnly - only acceptable values are 'Y' and 'N'. When set to "N", the script will create a Hub and Satellite combination. "Y" will cause the script to create a Satellite and hook it up to the specified Hub.
  • SprintDate - The Start Date of the current Sprint in Integer yyyymmdd. Script will create a new release sequence number based on that. In case if this release numbering system doesn't work for you, provide any other integer number.
  • ReleaseReference - Put the User Story and/or Task numbers for the Satellite which you are building.
  • ReleaseSource - system the release reference number refers to e.g. Jira, Rally
  • StageDatabase - the name of the Stage Database which holds the table on which the Load will be modelled. This Stage table needs to exist already.
  • StageSchema - the schema name of the Stage table
  • StageTable - the table name. Note that in case of Bespoke procedure, its name should match the stage table name with the "usp_" prefix.
  • StageSourceType - the type of the process that loads the data into the stage table. The only acceptable values are: 'BespokeProc' - in case if the stage table is populated by the stored procedure that ODE needs to execute to populate the stage table; 'ExternalStage' - in case if some other ETL tool populates the stage table and ODE can rely on it (also you can use it in case if you use the stage view); 'LeftRightComparison' - used for automated reconciliation; 'SSISPackage' - in case if ODE needs to execute SSIS package to populate the stage table.
  • StageLoadType - the type of data load. The only acceptable values are: 'Full' - in case if the satellite will always be populated by a full load process; 'Delta' - in case if data will be loaded in portions, e.g. log type data (note that this type of load doesn't support deletions); 'MSSQLcdc' - if the source is a generated in a special way SSIS package that gets the data from the MS SQL CDC data source; 'ODEcdc' - use this type if you use ODE Vault satellite functions as a source, e.g. to load the data between Raw and Business vaults in incremental mode.
  • StagePassLoadTypeToProc - whether the Stage stored procedure takes the load type parameter. 0 = Don't Pass it on, 1 = Pass the parameter value "Delta" or "Full" to the stage Procedure. You can implement the different code behaviour depending on the type of load. E.g. you can use it if you implement ODE CDC type of load. Only applicable to BespokeProc source type.
  • SourceDataFilter - provide a SQL statement of the conditions that need to be applied to the source dataset. This code will be added to the WHERE statement when SSIS package is generated. Only applicable to SSISPackage source type. In case of stored procedure this field could be used for the documentation purpose.
  • HubName - For completely Raw Hub Sat combinations, you can leave this column as null. The Script will create a Hub using the same name as the source table. For Business hubs, specify the name of the Hub of the Ensemble, which you are adding to. Note that there's no need to add prefix "h_".
  • SatelliteName - name of the satellite. Note that there's no need to add prefix "s_"
  • VaultName - the name of the vault where the Hub and Satellite will be created.
  • FullScheduleName - put a Full schedule name, e.g. 'Full_Load' This schedule needs to exist prior to running this script. Otherwise leave NULL if job doesn't require to be scheduled.
  • IncrementScheduleName - put an Incremental schedule name, e.g. 'Increment_Load' This schedule needs to exist prior to running this script. Otherwise leave NULL if job doesn't require to be scheduled.
  • HubKeyNames - the list of columns which are used for hub key. All the columns should be provided in case of multipart business key.
  • SourceSystemName - The name of the source system as at the table dv_source_system. If the source system is new for this ensemble, it should be created manually first.
  • SouceTableSchema - The schema of the source table. In case of SSIS package source, this should be a schema of the source CDC function.
  • SourceTableName - The source table name. In case of SSIS package source, put the table name, SSIS package logic adds required suffixes and prefixes to get to the CDC function automatically.
  • SSISPackageName - Only required if source type is SSIS package. Leave NULL in case of any other type.
  • DerivedColumnsList - The list of derived columns, i.e. columns that don't exist in source, but require to be created on the way to ODE. For example, a part of multi-part hub key that represents a source system. Mainly used for SSIS package source type. Leave it blank if there is no use for it.
⚠️ **GitHub.com Fallback** ⚠️