Persistent Staging Area with ODE - OptimalBI/optimal-data-engine-mssql GitHub Wiki

In ODE version 5 we have added SSIS (MS SQL Server Integration Services) data loads and CDC (Change Data Capture) loads.This functionality is enough to build Persistent Staging Area with ODE. PSA is a copy of the source database, but all the changes to all the records are persisted. At the moment supported platforms are MS SQL Server and Oracle. CDC mode is only available for MS SQL Server, built-in CDC should be enabled on the data source.

Adding Source Database to PSA

As source database should be persisted with the same tables structure, most of the configuration could be done automatically.

Create Stage and Vault databases

It makes sense to store every source database in separate PSA database. ODE is designed to control multiple Vaults. Stage and Vault databases creation is described in this article. Databases created with this script have all the settings required for SSIS execution. The table of defined structure added with the script is required to store SSIS execution parameters. Also you'll need to install the ODE_Admin database with all of its procedures on the same server where Config database is installed. Many of these procedures will be used in the subsequent steps.

Stage database name is “ODE_” + source system name + “_Stage” by default. This default is used in PSA automation scripts. If you choose different name for your stage table, don't forget to change it in those scripts.

Add Linked server

PSA automation scripts are extracting source database metadata using Linked server. This object is required for initial configuration. We haven't selected this approach for the data loads over the SSIS due to performance reasons, however it is a perfect approach for metadata extraction. Linked server name should match the source database name.

Linked server user should have admin permissions to be able to get metadata from system tables. So, it could be different from the login which is used for daily data load which only need data read access. You can delete Linked server after the source is configured. Check Microsoft documentation on Linked server for more instructions.

EXEC master.dbo.sp_addlinkedserver 
@server = 'CRM' -- Linked server name 
, @srvproduct = '' -- For Oracle data source the value is "Oracle"
, @provider = 'SQLNCLI' --Default SQLNCLI for SQL Server provider, use MSDASQL for Oracle data source connected via ODBC
, @datasrc = 'ABCXXXDB01' -- Source server name
, @catalog = 'CRMDB01' -- Source database name
GO

EXEC master.dbo.sp_addlinkedsrvlogin 
@rmtsrvname='CRM' -- Linked server name
,@useself = 'False'
,@locallogin = NULL
,@rmtuser = 'User1'
,@rmtpassword = 'Password123'
GO

Configure databases and connections

The first script configures a source system, stage database and source database connection. It includes many default values, but this is only good for PSA consistency. Read about our standard release numbering system.

Execute the following ODE_Admin procedure:

EXECUTE [Admin].[ODE_Configure_Objects_For_New_Source] 
   @source_system_name	= 'CRM' --Should be the same as the Linked server name
  ,@source_database_name = 'CRMDB01'
  ,@source_connection_string = 'Data Source=ABCXXXDB01;User ID=User1;Initial Catalog=CRMDB01;Provider=SQLNCLI11.1;Persist Security Info=True;Connect Timeout=30;'
  ,@source_connection_password = 'Password123'
  ,@source_database_type = 'MSSQLServer' -- Another available option is "Oracle"
  ,@package_folder = 'PSA_ODE' -- Root folder name in SSIS catalog for PSA load packages
  ,@SprintDate = 20170510
  ,@ReleaseReference = 'US028'
  ,@ReleaseSource = 'Jira'
GO

Get the List of Source tables

This step has been detached from the next step to allow some human interaction. Even if the PSA requirement is “Copy all tables”, there is always a table or two which should be skipped from the replication. Grab a list of the source tables on this step, edit it if required and feed it into the next step’s script.

Provide this ODE_Admin procedure with the linked server name and database schema you are interested in, it will return a list of tables n a format that could be fed into the next step. It is recommended to load tables from one schema at a time to prevent overlapping if tables in different schemas have the same name. This script works for both MS SQL Server and Oracle linked servers.

EXECUTE [Admin].[ODE_Get_List_Of_Tables_From_Source] 
   @source_system_name = 'CRM'
  ,@included_schemas = 'Sales'
GO

Copy the output list from the results panel:

Launch Automatic configuration

The list of tables generated on the previous step should be provided as one of the parameters for the following ODE_Admin procedure. In my example I have removed “___Maintenance” and test table “T1” from copying into PSA.

All table should have either primary key or unique index, otherwise automated process won’t be able to detect a key. If not all tables in the source database have primary keys, but some have unique indexes instead, configure them in different batches, i.e. execute the following procedure for tables of each type. Multipart key is acceptable. If the source table has neither primary key, not unique index, but there is a field or a group of fields that could identify the record uniquely, you can configure such table manually. (Tables with no unique key whatsoever could not be a part of replication, but why would you design a database like that on the first place?)

Before you execute the script, switch SSMS output to file (Ctrl + Shift + F). If you have many tables to configure, there will be too many lines of output to be shown on the screen.

DECLARE @SourceTables [dbo].[dv_table_list]
INSERT INTO @SourceTables VALUES
('Customers')
,('Order')
,('ReferenceData')
,('Staff')

EXECUTE [Admin].[ODE_Configure_Source_Table_List] 
   @KeyDetectionType = 'Primary' -- In case if source tables have valid primary keys. Another option is "Unique" in case if tables have unique index instead
  ,@SourceSystemName = 'CRM'
  ,@SourceSchema = 'Sales'
  ,@SourceTables = @SourceTables
  ,@StageDatabase = 'ODE_CRM_Stage'
  ,@StageSchema	= 'stage'
  ,@StageLoadType = 'MSSQLcdc' -- Use Full or Delta for Oracle data sources or if CDC is disabled
  ,@VaultDatabase = 'ODE_CRM_Vault'
  ,@SprintDate = 20170510
  ,@ReleaseReference = 'US028'
  ,@ReleaseSource = 'Jira'
GO

If succeeded, the txt file looks like this:

Please take note of the release number that was created as highlighted.

In case of error, scroll down the file to find the issue. Each table is configured in separate transaction.

When adding in an Oracle source, there are a collection of behind the scenes functions and procedures that map the data types from their original Oracle forms to SQL Server equivalents.

The current mapping approach is:

Oracle Datatype	SQL Server Equivalent	New Size	New Precision	New Scale
CHAR	NCHAR	[Original]*2	-	-
DATE	DATETIME2	7	23	3
FLOAT	VARCHAR	22	-	-
LONG	VARCHAR	22	-	-
NCHAR	NVARCHAR	[Original]*2	-	-
NUMBER	VARCHAR	22	-	-
NVARCHAR2	NVARCHAR	[Original]*2	-	-
VARCHAR2	NVARCHAR	[Original]*2	-	-

The [Original]*2 size for NVARCHAR/NCHAR is due to the way that the ODE creates the physical instantiation of the stored table configuration.

Create Tables in databases

Run the following ODE_Admin stored procedure to create SQL statement scripts to build all missing objects using above “release number”.

EXECUTE  [Admin].[ODE_Build_Vault_Object_Create_Statements]
  @ReleaseNumber = '2017051004'
  ,@Rebuild = 'N'
GO

Copy all the statements from the query output and execute them. This will create physical tables in Vault and Stage.

Create SSIS Packages with BIML

This process is described here, follow the steps starting from Create new BIML project.

What's next?

Add the Incremental schedule execution to the SQL Agent job, so that data will be updated regularly.

It could be a good idea to use PSA as a data source for your ODE Data Vault. The latter could easily load the data in ODE CDC mode. You'll need to create CDC functions on the PSA, which is just another ODE_Admin procedure. It is a normal modelling process for ODE DV side; you may need to use SSIS functionality again if your DV is located on the different server from PSA.