How To Monitor and Troubleshoot Data Loads - OptimalBI/optimal-data-engine-mssql GitHub Wiki

How to Monitor?

You can set up a scheduled email to be sent to an administrator or to a group of administrators responsible for monitoring the Data Vault loads. ODE sends an email on procedure dv_scheduler.dv_report_email_manifest_progress execution. SQL Server database mail profile needs to be set up to enable emailing from SQL Server. Email procedure could be scheduled for execution via the SQL Server Agent.

Procedure parameters are:

  • @vault_offset_in_days - how many days back report should cover. Default is 1 day, i.e. last 24 hours
  • @vault_recipients - list of email addresses separated by semicolon
  • @vault_profile_name - SQL Server database mail profile name
  • @vault_output_results - if set up to 1, procedure execution results will be shown on screen
  • @vault_email_results - if set up to 1, procedure execution results will be sent via email
  • @vault_top - the maximum number of records to show in the report. Default is 50. E.g. if there were more than 50 executions within the timeframe report covers, only top 50 will be reported.
  • @vault_html_string - output parameter of HTML that is sent via email.
The simplest example:
EXECUTE  [dv_scheduler].[dv_report_email_manifest_progress] 
   @vault_recipients = '[email protected];[email protected]'
  ,@vault_profile_name = 'SQL Server Database Mail Profile'
  ,@vault_output_results = 0
  ,@vault_email_results = 1

The email generated by ODE contains the following sections:

  • ODE scheduled executions started within the timeframe the report covers. Unfinished or failed DV loads are highlighted. Administrators could check daily if executions duration is within the expected timeframe.
  • ODE tasks not completed within the timeframe the report covers. If one or more executions covered above have unfinished tasks, they are listed in this section. Task could be attributed to the correct execution by the name and the run key. Individual tasks started manually by developers withing the report's timeframe are not included in this section.
  • ODE errors within the timeframe the report covers. This section includes any errors happened within this timeframe, including those which were not caused by scheduled executions. Report includes an exception description.
  • ODE duplicates removed within the timeframe the report covers. Satellite settings allow to set up duplicates threshold. If there are multiple records with the same key on the load, but number of these records per key is equal to or less than the threshold, ODE will load only one record per key, but the fact of duplicates removal is captured in this report. If threshold is set to 0 or number of duplicates is greater than the threshold number, DV load will fail.
That's how email looks like if daily load run fine:

As mentioned above, the same procedure could be used for generating the report output to the query screen. Set parameter @vault_output_results to 1 and @vault_email_results to 0 for the one-off execution.

How to troubleshoot?

If there were unfinished tasks in email, you can re-check the current status of unfinished tasks from the email by executing the following query. Use the run key for the scheduled execution specified in the email.

SELECT * 
FROM [dv_scheduler].[vw_manifest_status] 
WHERE run_key = 37
AND run_manifest_status != 'Completed'

All the tasks which are in progress, queued or scheduled should finish execution eventually. Delay in execution could be caused by insufficient server resources. In case of error task execution fails and none of the dependent tasks are processed within this load. You can find how to check processes and queues in this article under the "Monitoring the Run" section.

In case of error, you can find details in the ODE_Config.log4.Exception table.

SELECT *
FROM [ODE_Config].[log4].[Exception]
ORDER BY SystemDate DESC

This table provides error context, error message generated by SQL Server or by ODE, stored procedure where the error happened, date, user and execution stack as well. These fields are also available in the email.

In most cases Exceptions table provides enough information for fixing the bug that caused a failure. If an error message is not helpful, there is a way of logging each step of execution. In the table [ODE_Config].[log4].[JournalControl] set OnOffSwitch to "ON" against the procedure that caused the error. Next execution will record every step when this procedure is executed. Here you can find how to restart tasks under the "Dealing with Failed Tasks" section. Table [ODE_Config].[log4].[Journal] stores messages generated by ODE procedures. Table [ODE_Config].[log4].[JournalDetail] stores all the SQL queries executed during the procedure's execution. These two tables could be joined via the JournalID field.

You can leave Journal Control set to "ON" after the problem is identified and fixed, so that if another problem happens next time, you will have a track of everything happened including errors. Error details could be found in Journal by ExceptionID which is recorded in the Exceptions table.

⚠️ **GitHub.com Fallback** ⚠️