OSS Datasets - simon-oz/Weekly-AI-news GitHub Wiki
Language Model Evaluation Harness - A framework for few-shot evaluation of autoregressive language models.
JourneyDB - A Benchmark for Generative Image Understanding, available at Huggingface
Downloadable data - from Rider Uni.
Awesome public datasets - A topic-centric list of HQ open datasets.
LAION - 5B OPEN LARGE-SCALE MULTI-MODAL DATASETS
The Pile - 825 GiB diverse, open source language modelling data set
Kaggle datasets - 86 Datasets, over 100GB
Data-hub - collections of high quality datasets organized by topic