Ideas for Examples with VDK - vmware/versatile-data-kit GitHub Wiki
This page is a result of 1899 It provides a list of ideas to write VDK usage examples. The examples are taken from other existing tools, Reddit and from brainstorming.
Idea | Link to an existing example | Comments | Asignee |
---|---|---|---|
End-to-end data pipeline with VDK | Airflow e2e GoodReads Example | ||
Data Engineering project in 20 minutes | Scrape Real-Estate Properties Witt Spark, Delta Lake, S3 | ||
VDK tutorial for beginners - Full course - build your pipelines in 2h.. (or x time) | Good for YT video playlist | ||
Full DE lifecycle project with Architecture diagram + tests + template | Good source | Good blogs, well perceived on Reddit, amazing examples. The website is a great inspiration. We could ask him to write for us | |
Collection and integration of data from numerous sources | Streaming example | Various Data Sources Like APIs, CSVs, Webpages, JSON, etc. | |
Setting up your data infrastructure | Inspired by a Reddit post by a blogger describing what people are asking for | ||
Developing Production pipelines with VDK | Setting up Staging / Production | ||
Data engineering project with free template: VDK + PostgreSQL (anything else) + AWS | This is a perfect example of a good blogpost | ||
Batch processing full data pipeline - AWS S3 data lake + VDK | Airflow and the data movie reviews | ||
Near real-time data project. Cron is pulling it every 5 minutes. | Bitcoin exchange data from CoinCap API | ||
Data orchestration - ingestion, scheduling and setting dependencies - DAGs | |||
How to build a mature VDK project from scratch | |||
How to scale with VDK | I assume that VDK is the best tool for scaling and large projects, so we could promote in this direction | ||
Automated testing with VDK | |||
Create local data pipelines project - VDK SDK | |||
Versioning data/data pipelines | |||
Monitoring/troubleshooting data pipelines | |||
Web scraping with VDK (is it possible?) | |||
Scrape Stock and Twitter Data Using VDK, Kafka, and Spark | Example - Scrape Stock and Twitter Data Using VDK, Kafka, and Spark | ||
Analyzing GitHub repos for comments and questions | |||
Analyzing Stack Overflow data | |||
Analyzing github - space vs. tabs | Example | ||
Scrapping job portals | |||
Scrapping Reddit | We need to make one to track how often VDK is mentioned on Reddit | ||
World Happiness Report | Example | ||
Pollution in the United States is done, maybe we can do pollution in EU and even per country | US pollution | ||
YouTube video stats | Dataset of a daily record of the top trending YouTube videos and inspiration for projects | This dataset needs cleaning and it can be used for - Sentiment analysis - Categorizing YouTube videos based on their comments and statistics. - Analyzing what factors affect how popular a YouTube video will be - Statistical analysis over time | |
Stock sentiment—i.e. how people are feeling about a stock | Example | Project sounds really cool | |
Inflation tracking - calculation the inflation rate | Example | ||
Real estate price change | Now could be really interesting to track | ||
Movie recommendations with Azure using Spark SQL | Example | ||
Other ideas | 28 project ideas almost all are amazing | ||
NLP projects | Ideas | idk if it makes sense to do with VDK, but it's an amazing source of inspiration for interesting projects |
|