Data Sources and Targets - vmware/versatile-data-kit GitHub Wiki

VDK integrates Data Sources and Targets via Plugins. Versatile Data Kit supports querying multiple databases using its Plugin framework.

Data Source Plugin

Warning: This plugin is considered still pre-alpha plugin. A lot of changes or even being discarded are possible.

The data-sources project is a plugin for the Versatile Data Kit (VDK). It aims to simplify data ingestion from multiple sources by offering a single, unified API. Whether you're dealing with databases, REST APIs, or other forms of data, this project allows you to manage them all consistently. This is crucial for building scalable and maintainable data pipelines.

Singer.io

A relevant integration for data sources is vdk-singer which provides integration with singer.io and provides connectors for a lot of APIs. It provides a separate command for listing the available sources with command vdk singer --list

Available connections can be seen in Singer Taps:

Taps extract data from any source and write it to a standard stream in a JSON-based format:

  • Amazon S3 CSV
  • LinkedIn/Google/Facebook Ads
  • Ebay
  • GitHub
  • Google Analytics
  • Google Sheets
  • Jira
  • MySQL
  • ORACLE
  • PostgreSQL
  • SFTP
  • Salesforce
  • Slack
  • Trello
  • Zoom
  • ...

VDK Targets:

  • File
  • HTTP
  • Huggingface
  • PostgreSQL
  • Impala
  • Trino
  • Greenplum
  • SQLite
  • Oracle
  • DuckDB
  • ...

You can see all supported plugins here.

Open a new ticket if your source/target is missing.

You can always develop your own plugins and contribute them to our repository so other people can reuse your code.

➡️ Next Section: Properties