Big Data Architecture - fcrimins/fcrimins.github.io GitHub Wiki

Why not just use TensorFlow for everything?
- Dask creates its own execution graphs, but why is this necessary when TF already has them?
- In particular, TF even has support for reading from files. So if that is the case, then why not just construct the files and start the TF graph there?
- .tfrecords file format: all records for an entire training/validation/test set are intended to be written to a single file. See example here (which also includes good example usage of argparser and tf.app.
Dask
- Out-of-core functional/numpy/dataframes promoted by @jakevdp--so it must be good.
Xray + Dask: Out-of-Core, Labeled Arrays in Python
- Xray seems to have a clunky interface.
- And doesn't Dask have the same functionality?

Good YouTube talk describing all of the differences and the history of relational dbs (SQL) -> semi-structured -> document stores (NoSQL) along with a description of Hadoop (an architecture paradigm) along the way