Configuration - vmware/versatile-data-kit GitHub Wiki

Data Job Configuration

In order to see how to configure Data Job run (locally) e.g how to set credentials, ingestion endpoints, etc.

vdk config-help

config.ini

Here, the Data Job owner team, execution schedule and notifications are setup. The file follows the python ConfigParser format.

This is an example config file:

[owner]
team = my-team

[job]
schedule_cron = 30 22 * * *
python_requirements_file = other-requirements.txt

[contacts]
notified_on_job_failure_user_error = [email protected], [email protected], [email protected]
notified_on_job_failure_platform_error = [email protected]
notified_on_job_success =
notified_on_job_deploy = [email protected]

team

Specified which is the team that owns the Data Job. Value is case-sensitive and must be an actual team name. Generally, it should be auto-populated when creating a job. It is primarily for information purposes. Changing it won't have effect.

schedule_cron

cron scheduling format is described here: https://en.wikipedia.org/wiki/Cron. The cron expression is evaluated in UTC time. If it is time for a new job run and the previous job run hasn’t finished yet, the old execution is left to finish and the new execution is skipped

[contacts]

Semicolon-separated list of email addresses that will be notified with email message on a given condition. You can also provide email address linked to your Slack account in order to receive Slack messages. To generate Slack linked email address follow the steps here.

notified_on_job_failure_user_error

Semicolon-separated list of email addresses to be notified on job execution failure caused by user code or user configuration problem. For example: if the job contains an SQL script with syntax error.

notified_on_job_failure_platform_error

Semicolon-separated list of email addresses to be notified on job execution failure caused by a platform problem, including job execution delays.

notified_on_job_success

Semicolon-separated list of email addresses to be notified on job execution success.

notified_on_job_deploy

Semicolon-separated list of email addresses to be notified of job deployment outcome. Notice: If this file is malformed (file structure is not as per ConfigParser), then an email notification will NOT be sent to the recipients specified here.

enable_attempt_notifications

(If enabled by Control Service operators) Flag to enable or disable the email notifications sent to the recipients listed above for each Data Job run attempt. The default value is True.

enable_execution_notifications

(If enabled by Control Service operators) Flag to enable or disable email notifications per Data Job execution and execution delays. The default value is True.

See Dictionary for difference between Job Attempt and Job Execution

notification_delay_period_minutes

(If enabled by Control Service operators)

Specifies the time interval (in minutes) that job execution is allowed to be delayed from its scheduled time before a notification email is sent. These emails are sent to the addresses configured in the notified_on_job_failure_platform_error property. The default value is 240 (i.e. 4 hours).

➡️ Next Section: Overview