Work in Progress - jmadison222/knowledge GitHub Wiki

1. Work in Progress

These are candidate ideas that might grow into topic pages, or they might not. For now I’m just compiling thoughts.

Pick One Sample Database and Build It Until You Retire. Get really good at something like TPC-H, and build it in all new technologies. Learning new technologies is best done with a business problem. You can emulate having a business problem if you use an industry benchmark. Because I do data work, I want to emulate a data problem. I picked the TPC-H benchmark since it’s sufficiently complex yet still not too big. I automated it to an extreme. Then any time I wanted to learn a new database or database-like technology, I ported it to that technology. I first did it on Oracle, then ported it to Hive, Snowflake, and Postgres. I also ported the data layer to S3 under Snowflake as well as Parquet and Avro under Hive. And I’m sure a few other permutations I’m forgetting. By getting really good at one benchmark then porting it to any new technology, you can learn a great deal about that technology quickly.
Hacking Code is 10 Times Easier Than Enterprise Code. Working code runs in an application, and that application in an enterprise.
Data Science Is More Gambling Than Math. When all the number crunching is done, there is still a lot of intuition and speculation.
Why the Open Source Model Fails Within a Company. In the industry you get just the best, in your company, you get it all.
The Data Governance Scaling Fallacy. Why enterprise data governance doesn’t work like it did on your favorite team.