Preparation for Data Vault Modeling - OptimalBI/optimal-data-engine-mssql GitHub Wiki

Pre-requisite for creating Data Vault ensembles and links is an ODE Config database installed (refer here for ODE installation instructions), stage and vault databases created and have an audit table in there (refer here for the Data Vault database installation instructions)

Before you start creating your first hub and satellite with ODE, some preparatory configuration steps are required. These steps are not required to be set up too often, only for the new data sources.

First, create a release number. Each piece of configuration in ODE has a release number for easy promotion between environments.

EXECUTE [dv_release].[dv_release_master_insert] 
   @release_number = 20170123  -- This is an integer number. We recommend using the date format
  ,@release_description = 'Configure new data source' -- Provide some description for a piece of functionality
  ,@reference_number = 'US10' -- Reference to your task documentation system. Leave NULL if you are not using any
  ,@reference_source = 'Jira'  -- Name of your task documentation system
GO

Next step is only required if you are planning to use SSIS functionality for bringing data from another server. If you are using another tool for data acquisition that replicates data to the server where ODE is installed, skip this step. Provide two connection for ODE to know the location and credentials for the source system and the stage database, even if the latter located on the same server where ODE is installed. Note that connection information is not a subject of release. You will need to insert connection details with the same name into another copy of ODE when perform a release.

EXECUTE [dbo].[dv_connection_insert] 
   @connection_name = 'AdventureWorks'  -- Give a meaningful name for connection
  ,@connection_string = 'Provider=SQLNCLI11;Data Source=ABCDB01;Initial Catalog=AdventureWorks;Integrated Security=SSPI;'
  ,@connection_password = 'Password123' -- Provide a password if you are using SQL logon. In case of windows authentication leave NULL
  ,@connection_db_type = 'MSSqlServer' -- Provide a source database type. Currently ODE only works with Oracle and MS SQL Server
GO 
EXECUTE [dbo].[dv_connection_insert] 
   @connection_name = 'AdventureWorks_Stage'  -- Give a meaningful name for connection
  ,@connection_string = 'Provider=SQLNCLI11;Data Source=ABCDEVDB01;Initial Catalog=AdventureWorks_Stage;Integrated Security=SSPI;'
  ,@connection_password = NULL -- Provide a password if you are using SQL logon. In case of windows authentication leave NULL
  ,@connection_db_type = 'MSSqlServer'
GO 

Then configure a new source system.

EXECUTE [dbo].[dv_source_system_insert] 
   @source_system_name = 'AdvWorks' -- Give a meaningful name for source system
  ,@source_database_name = 'AdventureWorks'  -- The actual source database name
  ,@package_folder = 'Vault_External_Source' -- SSIS folder name in the Integration services catalog. Leave NULL if not using SSIS
  ,@package_project = 'DV_AdventureWorks' -- SSIS project name. Leave NULL if not using SSIS
  ,@project_connection_name = 'AdventureWorks' -- Connection name configured on previous stage. Leave NULL if not suing SSIS
  ,@is_retired = 0
  ,@release_number = 20170123
GO 

Add stage database you've created to the configuraton, so that ODE knows it exists. This procedure returns a surrogate key of the configured stage database, copy it.

 EXECUTE [dbo].[dv_stage_database_insert] 
   @stage_database_name = 'ODE_Stage' -- Stage database name you have crated
  ,@stage_connection_name = 'AdventureWorks_Stage' -- Provide a stage connection created above. Leave NULL if you are not using the SSIS functionality
  ,@is_retired = 0
  ,@release_number = 20170123
GO 

Add stage schema to the configutation. If you have created a stage database using the recommended method, your schema name is "Stage".

EXECUTE [dbo].[dv_stage_schema_insert] 
   @stage_database_key = 11  -- Paste a stage database key copied on previous step
  ,@stage_schema_name = 'Stage'
  ,@is_retired = 0
  ,@release_number = 20170123
GO 

The last step is configuing a schedule. Schedule is a list of table to be loaded within the lot. To load the data from source to the data vault ensembles, you just need to start a schedule, all the tables in the list will be processed. Modelling helper scripts allow you to add new table to two schedules, Full and Incremental. Full type of schedule execution loads the current state of data into the data vault structures. After that you can execute incremental loads to load just the changes to the dataset happened after the full load has been performed. Note that incremental logic is not implemented automatically in case if you are not using SSIS functionality. Once schedule is created, it could be added to the SQL Agent job for scheduled start. This article provides a full description of scheduling mechanism.

EXECUTE [dv_scheduler].[dv_schedule_insert] 
   @schedule_name          = 'Full_Load'
  ,@schedule_description   = 'Runs all Loads which only need to be refreshed daily'
  ,@schedule_frequency     = 'Daily'
  ,@release_number         = 20170123
GO
EXECUTE [dv_scheduler].[dv_schedule_insert] 
   @schedule_name          = 'Delta_Load'
  ,@schedule_description   = 'Runs all Loads which need to be refreshed hourly'
  ,@schedule_frequency     = 'Hourly'
  ,@release_number         = 20170123
GO  

Now everything is ready for starting modelling the data vault ensembles. Follow this link to get to the next step of ensemble modelling.