Persistent Staging Area with ODE - OptimalBI/optimal-data-engine-mssql GitHub Wiki
In ODE version 5 we have added SSIS (MS SQL Server Integration Services) data loads and CDC (Change Data Capture) loads.This functionality is enough to build Persistent Staging Area with ODE. PSA is a copy of the source database, but all the changes to all the records are persisted. At the moment supported platforms are MS SQL Server and Oracle. CDC mode is only available for MS SQL Server, built-in CDC should be enabled on the data source.
As source database should be persisted with the same tables structure, most of the configuration could be done automatically. It makes sense to store every source database in separate PSA database. ODE is designed to control multiple Vaults. Stage and Vault databases creation is described in this article. Databases created with this script have all the settings required for SSIS execution. The table of defined structure added with the script is required to store SSIS execution parameters. Also you'll need to install the ODE_Admin database with all of its procedures on the same server where Config database is installed. Many of these procedures will be used in the subsequent steps.Stage database name is “ODE_” + source system name + “_Stage” by default. This default is used in PSA automation scripts. If you choose different name for your stage table, don't forget to change it in those scripts.
PSA automation scripts are extracting source database metadata using Linked server. This object is required for initial configuration. We haven't selected this approach for the data loads over the SSIS due to performance reasons, however it is a perfect approach for metadata extraction. Linked server name should match the source database name.Linked server user should have admin permissions to be able to get metadata from system tables. So, it could be different from the login which is used for daily data load which only need data read access. You can delete Linked server after the source is configured. Check Microsoft documentation on Linked server for more instructions.
EXEC master.dbo.sp_addlinkedserver
@server = 'CRM' -- Linked server name
, @srvproduct = '' -- For Oracle data source the value is "Oracle"
, @provider = 'SQLNCLI' --Default SQLNCLI for SQL Server provider, use MSDASQL for Oracle data source connected via ODBC
, @datasrc = 'ABCXXXDB01' -- Source server name
, @catalog = 'CRMDB01' -- Source database name
GO
EXEC master.dbo.sp_addlinkedsrvlogin
@rmtsrvname='CRM' -- Linked server name
,@useself = 'False'
,@locallogin = NULL
,@rmtuser = 'User1'
,@rmtpassword = 'Password123'
GO
Execute the following ODE_Admin procedure:
EXECUTE [Admin].[ODE_Configure_Objects_For_New_Source]
@source_system_name = 'CRM' --Should be the same as the Linked server name
,@source_database_name = 'CRMDB01'
,@source_connection_string = 'Data Source=ABCXXXDB01;User ID=User1;Initial Catalog=CRMDB01;Provider=SQLNCLI11.1;Persist Security Info=True;Connect Timeout=30;'
,@source_connection_password = 'Password123'
,@source_database_type = 'MSSQLServer' -- Another available option is "Oracle"
,@package_folder = 'PSA_ODE' -- Root folder name in SSIS catalog for PSA load packages
,@SprintDate = 20170510
,@ReleaseReference = 'US028'
,@ReleaseSource = 'Jira'
GO
Provide this ODE_Admin procedure with the linked server name and database schema you are interested in, it will return a list of tables n a format that could be fed into the next step. It is recommended to load tables from one schema at a time to prevent overlapping if tables in different schemas have the same name. This script works for both MS SQL Server and Oracle linked servers.
EXECUTE [Admin].[ODE_Get_List_Of_Tables_From_Source]
@source_system_name = 'CRM'
,@included_schemas = 'Sales'
GO
Copy the output list from the results panel:
The list of tables generated on the previous step should be provided as one of the parameters for the following ODE_Admin procedure. In my example I have removed “___Maintenance” and test table “T1” from copying into PSA.All table should have either primary key or unique index, otherwise automated process won’t be able to detect a key. If not all tables in the source database have primary keys, but some have unique indexes instead, configure them in different batches, i.e. execute the following procedure for tables of each type. Multipart key is acceptable. If the source table has neither primary key, not unique index, but there is a field or a group of fields that could identify the record uniquely, you can configure such table manually. (Tables with no unique key whatsoever could not be a part of replication, but why would you design a database like that on the first place?)
Before you execute the script, switch SSMS output to file (Ctrl + Shift + F). If you have many tables to configure, there will be too many lines of output to be shown on the screen.
DECLARE @SourceTables [dbo].[dv_table_list]
INSERT INTO @SourceTables VALUES
('Customers')
,('Order')
,('ReferenceData')
,('Staff')
EXECUTE [Admin].[ODE_Configure_Source_Table_List]
@KeyDetectionType = 'Primary' -- In case if source tables have valid primary keys. Another option is "Unique" in case if tables have unique index instead
,@SourceSystemName = 'CRM'
,@SourceSchema = 'Sales'
,@SourceTables = @SourceTables
,@StageDatabase = 'ODE_CRM_Stage'
,@StageSchema = 'stage'
,@StageLoadType = 'MSSQLcdc' -- Use Full or Delta for Oracle data sources or if CDC is disabled
,@VaultDatabase = 'ODE_CRM_Vault'
,@SprintDate = 20170510
,@ReleaseReference = 'US028'
,@ReleaseSource = 'Jira'
GO
If succeeded, the txt file looks like this:
Please take note of the release number that was created as highlighted.
In case of error, scroll down the file to find the issue. Each table is configured in separate transaction.
When adding in an Oracle source, there are a collection of behind the scenes functions and procedures that map the data types from their original Oracle forms to SQL Server equivalents.
The current mapping approach is:
Oracle Datatype | SQL Server Equivalent | New Size | New Precision | New Scale |
CHAR | NCHAR | [Original]*2 | - | - |
DATE | DATETIME2 | 7 | 23 | 3 |
FLOAT | VARCHAR | 22 | - | - |
LONG | VARCHAR | 22 | - | - |
NCHAR | NVARCHAR | [Original]*2 | - | - |
NUMBER | VARCHAR | 22 | - | - |
NVARCHAR2 | NVARCHAR | [Original]*2 | - | - |
VARCHAR2 | NVARCHAR | [Original]*2 | - | - |
EXECUTE [Admin].[ODE_Build_Vault_Object_Create_Statements]
@ReleaseNumber = '2017051004'
,@Rebuild = 'N'
GO
Copy all the statements from the query output and execute them. This will create physical tables in Vault and Stage.
This process is described here, follow the steps starting from Create new BIML project. Add the Incremental schedule execution to the SQL Agent job, so that data will be updated regularly.It could be a good idea to use PSA as a data source for your ODE Data Vault. The latter could easily load the data in ODE CDC mode. You'll need to create CDC functions on the PSA, which is just another ODE_Admin procedure. It is a normal modelling process for ODE DV side; you may need to use SSIS functionality again if your DV is located on the different server from PSA.