ExampleHome - vmware/versatile-data-kit GitHub Wiki

Versatile Data Kit (VDK) is a data framework that enables Data Engineers to

🧑‍💻 develop,
▶️ run,
📊 and manage data workloads, aka data jobs

What Problem Does Versatile Data Kit Solve?

  • Ingest data from different sources.
  • Use Python/SQL and VDK templates to transform data.
  • Package, version, and deploy data applications while dealing with credentials, retries, and reconnects.
  • Provide built-in monitoring and smart notification capabilities.
  • Track code and data modifications for quicker troubleshooting and version rollback.

See our introduction blog post

Quickstart

Getting started with VDK SDK

All getting started work in Google Collab (link) or any installation of VDK. But if you want to run examples locally, try out quickstart VDK

pip install quickstart-vdk

This installs the core vdk packages and the vdk command line interface. You can use them to run jobs in your local shell environment. Then you can run

vdk dev-studio --start

to start a local notebook server and follow the instructions there.

image image image image image
Extract data
with
VDK Ingester
Process data with SQL
with
VDK Managed Connection
Create a star schema
with
VDK Templates
Extract data incrementally
with
VDK Ingester and Properties
Trace your SQL provenience
with
installing VDK lineage plugin

Getting started with VDK Control Service

image image image image image
Install VDK Server Deploy Job Rollback Job to latest stable version Schedule Job Monitor Job with Operations UI

Installation

➡️ See the Installation for more details.

Create and run data jobs locally

pip install quickstart-vdk

This installs the core vdk packages and the vdk command line interface. You can use them to run jobs in your local shell environment.

See also the Getting Started section of the wiki

Main Concepts

➡️ See the Interfaces for more details.

⚠️ **GitHub.com Fallback** ⚠️