How To Implement Delta Load - OptimalBI/optimal-data-engine-mssql GitHub Wiki

This feature works in ODE versions starting from 2.2.

ODE supports two types of data load, Full and Delta. Full load checks all the data changes for a table, including deletions. Delta load includes inserts and updates only. It makes sense to limit data for the delta load by recent changes only.

This article describes a solution for enabling two types of load for one table to run on different schedules. E.g. run the delta daily during the week to apply recent changes, and run full load on the weekend to pick up record deletions.

First enable two load types for one source table with a global setting, i.e. it will be enabled for all tables in the Data Vault. It doesn’t mean all tables must be scheduled twice, this just enables a parameter of the stage stored procedure. Be careful if you already have source tables configured with stored procedures. They’ll require changes to be applied when switching this setting on and off.

USE [ODE_Config]
GO
 
UPDATE [dbo].[dv_defaults]
SET [default_varchar] = 'Y' --set to 'N' if you want to switch this function off
WHERE [default_type] = 'Global'
    AND [default_subtype] = 'StageDeltaSwitch'

From now on any staging stored procedures should have one input parameter of string type. Scheduler will pass a load type with this parameter when running a table load. If you want to enable both types of load, use this parameter in your source procedure’s logic. Otherwise just ignore it. Below is an example of the staging stored procedure which loads table ‘Customer’ from AdwentureWorks database to ODE_Stage. We assume that [ModifiedDate] is a reliable field to trace delta changes.

USE [ODE_Stage]
GO
 
CREATE PROCEDURE [stage].[usp_Sales_Customer]
    @Load_Type varchar(50)
AS
BEGIN
 
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = object_id(N'[Stage].[Sales_Customer]') AND type IN (N'U'))
    DROP TABLE [Stage].[Sales_Customer]
 
SELECT [CustomerID]
      ,[PersonID]
      ,[StoreID]
      ,[TerritoryID]
      ,[AccountNumber]
      ,[RowGuid]
      ,[ModifiedDate]
INTO [Stage].[Sales_Customer]
FROM [AdventureWorks].[Sales].[Customer]
WHERE [ModifiedDate] >= CASE WHEN @Load_Type = 'Delta' --Values are Delta and Full
                THEN DATEADD(HOUR, -25, GETDATE()) --load changes for the last 25 hours for Delta
                ELSE '1900-01-01' END
END
GO

Stage table [Stage].[Sales_Customer] is a source for Data Vault hub h_Sales_Customer and satellite s_Sales_Customer. Follow the instructions if you don’t know how to create them.

Now create two schedules, one for each type of load. [Read more about task scheduling feature here}(https://github.com/OptimalBI/optimal-data-engine-mssql/wiki/Task-Scheduling).

USE [ODE_Config]
GO
EXECUTE [dv_scheduler].[dv_schedule_insert]
   @schedule_name          = 'Full_Load'
  ,@schedule_description   = 'Runs all Loads which only need to be refreshed weekly'
  ,@schedule_frequency     = 'Weekly on Sunday'
  ,@release_number         = 2016081002
GO
EXECUTE [dv_scheduler].[dv_schedule_insert]
   @schedule_name          = 'Delta_Load'
  ,@schedule_description   = 'Runs all Loads which need to be refreshed daily'
  ,@schedule_frequency     = 'Daily'
  ,@release_number         = 2016081002
GO

Add the source table to both of these schedules with different settings:

EXECUTE [dv_scheduler].[dv_schedule_source_table_insert]
   @schedule_name   = 'Full_Load'
  ,@source_system_name  = 'AdventureWorks'
  ,@source_table_schema = 'Stage'
  ,@source_table_name   = 'Sales_Customer'
  ,@source_table_load_type = 'Full'
  ,@priority    = 'Low'
  ,@queue   = '001'
  ,@release_number  = 2016081002
GO
EXECUTE [dv_scheduler].[dv_schedule_source_table_insert]
   @schedule_name   = 'Delta_Load'
  ,@source_system_name  = 'AdventureWorks'
  ,@source_table_schema = 'Stage'
  ,@source_table_name   = 'Sales_Customer'
  ,@source_table_load_type = 'Delta'
  ,@priority    = 'Low'
  ,@queue   = '001'
  ,@release_number  = 2016081002
GO

Run a Delta_Load schedule to load the data modified within the last 25 hours; or run a Full_Load to load all changes including deletions.

EXEC [dv_scheduler].[dv_process_schedule] 'Delta_run'

As both loads use same staging table, make sure their schedules don’t intersect in time to avoid a data loss.