Big Data Architecture - fcrimins/fcrimins.github.io GitHub Wiki
Out-of-core data options (4/21/17)
Why not just use TensorFlow for everything?
Dask creates its own execution graphs, but why is this necessary when TF already has them?
In particular, TF even has support for reading from files. So if that is the case, then why not just construct the files and start the TF graph there?
.tfrecords file format: all records for an entire training/validation/test set are intended to be written to a single file. See example here (which also includes good example usage of argparser and tf.app.
Good YouTube talk describing all of the differences and the history of relational dbs (SQL) -> semi-structured -> document stores (NoSQL) along with a description of Hadoop (an architecture paradigm) along the way