How To Reconcile Tables Using ODE v4 - OptimalBI/optimal-data-engine-mssql GitHub Wiki

One of the new features of ODE version 4 is built-in reconciliation. This feature can be used to compare any two objects in the Data Vault including staging tables. So you can compare two satellites, or stage table and satellite, or two links, or any other combination if it makes sense to you. Moreover, this reconciliation could be scheduled, so data will be reconciled on every load.

Use stored procedure [dv_integrity].[dv_LeftrightComparison_AdHoc] to perform an one-off reconciliation.

Parameters:

  • @left_object_name – left table name
  • @left_object_schema – left table database schema
  • @left_object_database – left table database name
  • @left_object_type – left table type; could be satellite, hub, link or staging table
  • @left_sat_pit – if left table is a satellite, you can set up a point in time for the dataset. Leave NULL for current records or in case if it’s any other type of table
  • @right_object_name – right table name
  • @right_object_schema – right table database schema
  • @right_object_database – right table database name
  • @right_object_type – right table type; could be satellite, hub, link or staging table
  • @right_sat_pit – if right table is a satellite, you can set up a point in time for the dataset. Leave NULL for current records or in case if it’s any other type of table
  • @output_database – provide a database name if you want to record results into a table
  • @output_schema – provide a database schema if you want to record results into a table
  • @output_name – provide a table name if you want to record results into a table
  • @select_into – set to 1 in case if the results table defined above does not exist, so ODE will create it; otherwise 0 *@match_key – integer number to be passed as it is to the results table. You can use this field for the scheduled loads, e.g. increment by 1 with every load. Otherwise leave it 0 for ad hoc reconciliation *@payload_columns – provide a list of matching columns from the left and right tables

In this example we will compare two satellites. Let’s say we get Person details from two source systems, Sales and Payments:

SELECT [BusinessEntityID]
,[FirstName]
,[MiddleName]
,[LastName]
,[Suffix]
,SYSDATETIMEOFFSET() as [dv_stage_datetime]
INTO [stage].[Sales_Person]
FROM [AdventureWorks].[Person].[Person]
GO
 
SELECT [BusinessEntityID]
,[FirstName]
,[MiddleName]
,[LastName] AS [Surname]
,CASE WHEN [FirstName] = 'Bob' THEN 'Jr' ELSE [Suffix] END [Suffix]
,SYSDATETIMEOFFSET() as [dv_stage_datetime]
INTO [stage].[Payments_Person]
FROM [AdventureWorks].[Person].[Person]
GO

Use helper scripts to create Person hub and two satellites.

Run the stored procedure [dv_integrity].[dv_LeftrightComparison_AdHoc]:

DECLARE @payload_columns [dbo].[dv_column_matching_list]
INSERT @payload_columns VALUES
('BusinessEntityID', 'BusinessEntityID'),
('FirstName',        'FirstName'),
('MiddleName',       'MiddleName'),
('LastName',         'Surname'),
('Suffix',           'Suffix')
 
EXECUTE [dv_integrity].[dv_LeftrightComparison_AdHoc]
--Describe object on the left side
   @left_object_name      = 'Sales_Person'
  ,@left_object_schema    = 'sat'
  ,@left_object_database  = 'ODE_Vault'
  ,@left_object_type      = 'sat' -- Possible values are 'sat', 'lnk', 'hub', 'stg'
  ,@left_sat_pit          = NULL -- Current state of data
--Describe object on the right side
  ,@right_object_name     = 'Payments_Person'
  ,@right_object_schema   = 'sat'
  ,@right_object_database = 'ODE_Vault'
  ,@right_object_type     = 'sat' -- Possible values are 'sat', 'lnk', 'hub', 'stg'
  ,@right_sat_pit         = NULL -- Current state of data
--Describe output
  ,@output_database       = NULL -- OR e.g. 'Reconciliation'
  ,@output_schema         = NULL -- OR e.g. 'vault'
  ,@output_name           = NULL -- OR e.g. 'PersonMatch'
  ,@select_into           = 0 -- OR 1 if output table needs to be created
  ,@match_key             = 0
  ,@payload_columns       = @payload_columns

The output is 22 records as there were 11 Bobs in AdventureWorks.

If you want this reconciliation to be repeatable process, you can schedule it. Use ODE_object_match_config stored procedure. Note that this stored procedure is not included in core ODE configuration, it is one of the configuration support scripts. You can create this stored procedure in Admin database or use it as a standalone script. Provide this stored procedure with same parameters as above. This procedure have settings for enabling columnstore indexes and table compression, use them to get the best results from your SQL Server instance. This procedure produces scripts to create a hub and satellite for your reconciliation task. Execute them to create new objects and run the one-off data load for checking if everything is configured right.

It is probably a good idea to put all the reconciliation tasks to the separate Data Vault, e.g. ODE_Testing_Vault. Also you can create separate ODE data load schedule just for reconciliation and execute it after the full Data Vault data load. Use new satellite as a source for reconciliation report using your favourite reporting tool.