Ideas for Examples with VDK - vmware/versatile-data-kit GitHub Wiki
This page is a result of 1899 It provides a list of ideas to write VDK usage examples. The examples are taken from other existing tools, Reddit and from brainstorming.
| Idea | Link to an existing example | Comments | Asignee |
|---|---|---|---|
| End-to-end data pipeline with VDK | Airflow e2e GoodReads Example | ||
| Data Engineering project in 20 minutes | Scrape Real-Estate Properties Witt Spark, Delta Lake, S3 | ||
| VDK tutorial for beginners - Full course - build your pipelines in 2h.. (or x time) | Good for YT video playlist | ||
| Full DE lifecycle project with Architecture diagram + tests + template | Good source | Good blogs, well perceived on Reddit, amazing examples. The website is a great inspiration. We could ask him to write for us | |
| Collection and integration of data from numerous sources | Streaming example | Various Data Sources Like APIs, CSVs, Webpages, JSON, etc. | |
| Setting up your data infrastructure | Inspired by a Reddit post by a blogger describing what people are asking for | ||
| Developing Production pipelines with VDK | Setting up Staging / Production | ||
| Data engineering project with free template: VDK + PostgreSQL (anything else) + AWS | This is a perfect example of a good blogpost | ||
| Batch processing full data pipeline - AWS S3 data lake + VDK | Airflow and the data movie reviews | ||
| Near real-time data project. Cron is pulling it every 5 minutes. | Bitcoin exchange data from CoinCap API | ||
| Data orchestration - ingestion, scheduling and setting dependencies - DAGs | |||
| How to build a mature VDK project from scratch | |||
| How to scale with VDK | I assume that VDK is the best tool for scaling and large projects, so we could promote in this direction | ||
| Automated testing with VDK | |||
| Create local data pipelines project - VDK SDK | |||
| Versioning data/data pipelines | |||
| Monitoring/troubleshooting data pipelines | |||
| Web scraping with VDK (is it possible?) | |||
| Scrape Stock and Twitter Data Using VDK, Kafka, and Spark | Example - Scrape Stock and Twitter Data Using VDK, Kafka, and Spark | ||
| Analyzing GitHub repos for comments and questions | |||
| Analyzing Stack Overflow data | |||
| Analyzing github - space vs. tabs | Example | ||
| Scrapping job portals | |||
| Scrapping Reddit | We need to make one to track how often VDK is mentioned on Reddit | ||
| World Happiness Report | Example | ||
| Pollution in the United States is done, maybe we can do pollution in EU and even per country | US pollution | ||
| YouTube video stats | Dataset of a daily record of the top trending YouTube videos and inspiration for projects | This dataset needs cleaning and it can be used for - Sentiment analysis - Categorizing YouTube videos based on their comments and statistics. - Analyzing what factors affect how popular a YouTube video will be - Statistical analysis over time | |
| Stock sentiment—i.e. how people are feeling about a stock | Example | Project sounds really cool | |
| Inflation tracking - calculation the inflation rate | Example | ||
| Real estate price change | Now could be really interesting to track | ||
| Movie recommendations with Azure using Spark SQL | Example | ||
| Other ideas | 28 project ideas almost all are amazing | ||
| NLP projects | Ideas | idk if it makes sense to do with VDK, but it's an amazing source of inspiration for interesting projects |
|