Work in Progress - jmadison222/knowledge GitHub Wiki
| Home |
These are candidate ideas that might grow into topic pages, or they might not. For now I’m just compiling thoughts.
-
Pick One Sample Database and Build It Until You Retire. Get really good at something like TPC-H, and build it in all new technologies. Learning new technologies is best done with a business problem. You can emulate having a business problem if you use an industry benchmark. Because I do data work, I want to emulate a data problem. I picked the TPC-H benchmark since it’s sufficiently complex yet still not too big. I automated it to an extreme. Then any time I wanted to learn a new database or database-like technology, I ported it to that technology. I first did it on Oracle, then ported it to Hive, Snowflake, and Postgres. I also ported the data layer to S3 under Snowflake as well as Parquet and Avro under Hive. And I’m sure a few other permutations I’m forgetting. By getting really good at one benchmark then porting it to any new technology, you can learn a great deal about that technology quickly.
-
Hacking Code is 10 Times Easier Than Enterprise Code. Working code runs in an application, and that application in an enterprise.
-
Data Science Is More Gambling Than Math. When all the number crunching is done, there is still a lot of intuition and speculation.
-
Why the Open Source Model Fails Within a Company. In the industry you get just the best, in your company, you get it all.
-
The Data Governance Scaling Fallacy. Why enterprise data governance doesn’t work like it did on your favorite team.