OSS Datasets - simon-oz/Weekly-AI-news GitHub Wiki

Language Model Evaluation Harness - A framework for few-shot evaluation of autoregressive language models.

JourneyDB - A Benchmark for Generative Image Understanding, available at Huggingface

Downloadable data - from Rider Uni.

Awesome public datasets - A topic-centric list of HQ open datasets.

LAION - 5B OPEN LARGE-SCALE MULTI-MODAL DATASETS

The Pile - 825 GiB diverse, open source language modelling data set

Kaggle datasets - 86 Datasets, over 100GB

Data-hub - collections of high quality datasets organized by topic