Ideas for Examples with VDK - vmware/versatile-data-kit GitHub Wiki

This page is a result of 1899 It provides a list of ideas to write VDK usage examples. The examples are taken from other existing tools, Reddit and from brainstorming.

Idea	Link to an existing example	Comments
End-to-end data pipeline with VDK	Airflow e2e GoodReads Example
Data Engineering project in 20 minutes	Scrape Real-Estate Properties Witt Spark, Delta Lake, S3
VDK tutorial for beginners - Full course - build your pipelines in 2h.. (or x time)		Good for YT video playlist
Full DE lifecycle project with Architecture diagram + tests + template	Good source	Good blogs, well perceived on Reddit, amazing examples. The website is a great inspiration. We could ask him to write for us
Collection and integration of data from numerous sources	Streaming example	Various Data Sources Like APIs, CSVs, Webpages, JSON, etc.
Setting up your data infrastructure		Inspired by a Reddit post by a blogger describing what people are asking for
Developing Production pipelines with VDK		Setting up Staging / Production
Data engineering project with free template: VDK + PostgreSQL (anything else) + AWS	This is a perfect example of a good blogpost
Batch processing full data pipeline - AWS S3 data lake + VDK	Airflow and the data movie reviews
Near real-time data project. Cron is pulling it every 5 minutes.	Bitcoin exchange data from CoinCap API
Data orchestration - ingestion, scheduling and setting dependencies - DAGs
How to build a mature VDK project from scratch
How to scale with VDK		I assume that VDK is the best tool for scaling and large projects, so we could promote in this direction
Automated testing with VDK
Create local data pipelines project - VDK SDK
Versioning data/data pipelines
Monitoring/troubleshooting data pipelines
Web scraping with VDK (is it possible?)
Scrape Stock and Twitter Data Using VDK, Kafka, and Spark	Example - Scrape Stock and Twitter Data Using VDK, Kafka, and Spark
Analyzing GitHub repos for comments and questions
Analyzing Stack Overflow data
Analyzing github - space vs. tabs	Example
Scrapping job portals
Scrapping Reddit		We need to make one to track how often VDK is mentioned on Reddit
World Happiness Report	Example
Pollution in the United States is done, maybe we can do pollution in EU and even per country	US pollution
YouTube video stats	Dataset of a daily record of the top trending YouTube videos and inspiration for projects	This dataset needs cleaning and it can be used for - Sentiment analysis - Categorizing YouTube videos based on their comments and statistics. - Analyzing what factors affect how popular a YouTube video will be - Statistical analysis over time
Stock sentiment—i.e. how people are feeling about a stock	Example	Project sounds really cool
Inflation tracking - calculation the inflation rate	Example
Real estate price change		Now could be really interesting to track
Movie recommendations with Azure using Spark SQL	Example
Other ideas	28 project ideas almost all are amazing
NLP projects	Ideas	idk if it makes sense to do with VDK, but it's an amazing source of inspiration for interesting projects