Overview - vmware/versatile-data-kit GitHub Wiki
What Problem Does Versatile Data Kit Solve?
- Ingest data from different sources.
- Use Python/SQL and VDK templates to transform data.
- Package, version, and deploy data applications while dealing with credentials, retries, and reconnects.
- Provide built-in monitoring and smart notification capabilities.
- Track code and data modifications for quicker troubleshooting and version rollback.
These can be achieved by running parameterized SQL queries and Python scripts in an automated manner on top of Kubernetes infrastructure in the form of Data Ingestion and Data Processing Jobs.
Versatile Data Kit consists of two main components:
- VDK SDK provides automation for developing your data workflows with Python and/or SQL.
- VDK Control Service provides automation for deploying your data workflows.
This documentation is split into these components so you can find the information based on the need. SDK - for Data Developers of Data Jobs; Control Service - For Operators operating Control Service.
Next Sections:
Install VDK SDK
➡️if you want to get started ingesting data or transforming data. 🔄
Install Control Service
➡️if you want to deploy your developed data jobs and operate and monitor them using Operations UI. 🚀
Install VDK Plugins
➡️if you want to extend the functionality of Versatile Data Kit SDK. 🧩