Data - fcrimins/fcrimins.github.io GitHub Wiki
3 Million Instacart Orders, Open Sourced
Google's One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling (3/24/17)
- per here
NLTK has its own datasets (3/20/17)
- downloaded here: ~/nltk_data
First billion characters from Wikipedia
UCI Machine Learning Datasets
The top bestsellers of 1916
- But what are the bestsellers from 1916 with sales normalized by year after publication?
Ratings datasets are figuratively just lying around the web these days, begging for someone to take notice and analyze them.
- Movie reviews from the Netflix Prize dataset
- Business reviews from the Yelp Academic Dataset, as summarized here
- Amazon book reviews from the Multi-domain Sentiment Dataset
- News ratings dataset from Reddit
41 Machine Learning Interview Questions (1/30/17)
- 19 Free Public Data Sets For Your First Data Science Project
- "check out Quandl for economic and financial data, and Kaggle’s Datasets collection for another great list