Work in Progress - jmadison222/knowledge GitHub Wiki

| Home |


1. Work in Progress

These are candidate ideas that might grow into topic pages, or they might not. For now I’m just compiling thoughts.

  • Pick One Sample Database and Build It Until You Retire. Get really good at something like TPC-H, and build it in all new technologies. Learning new technologies is best done with a business problem. You can emulate having a business problem if you use an industry benchmark. Because I do data work, I want to emulate a data problem. I picked the TPC-H benchmark since it’s sufficiently complex yet still not too big. I automated it to an extreme. Then any time I wanted to learn a new database or database-like technology, I ported it to that technology. I first did it on Oracle, then ported it to Hive, Snowflake, and Postgres. I also ported the data layer to S3 under Snowflake as well as Parquet and Avro under Hive. And I’m sure a few other permutations I’m forgetting. By getting really good at one benchmark then porting it to any new technology, you can learn a great deal about that technology quickly.

  • Hacking Code is 10 Times Easier Than Enterprise Code. Working code runs in an application, and that application in an enterprise.

  • Data Science Is More Gambling Than Math. When all the number crunching is done, there is still a lot of intuition and speculation.

  • Why the Open Source Model Fails Within a Company. In the industry you get just the best, in your company, you get it all.

  • The Data Governance Scaling Fallacy. Why enterprise data governance doesn’t work like it did on your favorite team.


⚠️ **GitHub.com Fallback** ⚠️