Ideas for Examples with VDK - vmware/versatile-data-kit GitHub Wiki

This page is a result of 1899 It provides a list of ideas to write VDK usage examples. The examples are taken from other existing tools, Reddit and from brainstorming.

Idea Link to an existing example Comments Asignee
End-to-end data pipeline with VDK Airflow e2e GoodReads Example
Data Engineering project in 20 minutes Scrape Real-Estate Properties Witt Spark, Delta Lake, S3
VDK tutorial for beginners - Full course - build your pipelines in 2h.. (or x time) Good for YT video playlist
Full DE lifecycle project with Architecture diagram + tests + template Good source Good blogs, well perceived on Reddit, amazing examples. The website is a great inspiration. We could ask him to write for us
Collection and integration of data from numerous sources Streaming example Various Data Sources Like APIs, CSVs, Webpages, JSON, etc.
Setting up your data infrastructure Inspired by a Reddit post by a blogger describing what people are asking for
Developing Production pipelines with VDK Setting up Staging / Production
Data engineering project with free template: VDK + PostgreSQL (anything else) + AWS This is a perfect example of a good blogpost
Batch processing full data pipeline - AWS S3 data lake + VDK Airflow and the data movie reviews
Near real-time data project. Cron is pulling it every 5 minutes. Bitcoin exchange data from CoinCap API
Data orchestration - ingestion, scheduling and setting dependencies - DAGs
How to build a mature VDK project from scratch
How to scale with VDK I assume that VDK is the best tool for scaling and large projects, so we could promote in this direction
Automated testing with VDK
Create local data pipelines project - VDK SDK
Versioning data/data pipelines
Monitoring/troubleshooting data pipelines
Web scraping with VDK (is it possible?)
Scrape Stock and Twitter Data Using VDK, Kafka, and Spark Example - Scrape Stock and Twitter Data Using VDK, Kafka, and Spark
Analyzing GitHub repos for comments and questions
Analyzing Stack Overflow data
Analyzing github - space vs. tabs Example
Scrapping job portals
Scrapping Reddit We need to make one to track how often VDK is mentioned on Reddit
World Happiness Report Example
Pollution in the United States is done, maybe we can do pollution in EU and even per country US pollution
YouTube video stats Dataset of a daily record of the top trending YouTube videos and inspiration for projects This dataset needs cleaning and it can be used for - Sentiment analysis - Categorizing YouTube videos based on their comments and statistics. - Analyzing what factors affect how popular a YouTube video will be - Statistical analysis over time
Stock sentiment—i.e. how people are feeling about a stock Example Project sounds really cool
Inflation tracking - calculation the inflation rate Example
Real estate price change Now could be really interesting to track
Movie recommendations with Azure using Spark SQL Example
Other ideas 28 project ideas almost all are amazing
NLP projects Ideas idk if it makes sense to do with VDK, but it's an amazing source of inspiration for interesting projects

|