GLUE - AshokBhat/ml GitHub Wiki

About

General Language Understanding Evaluation (GLUE) benchmark
A collection of resources for training, evaluating, and analyzing natural language understanding systems.

Benchmark: Nine sentence- or sentence-pair language understanding tasks built on established existing datasets and selected to cover a diverse range of dataset sizes, text genres, and degrees of difficulty,
Diagnostic dataset: Designed to evaluate and analyze model performance with respect to a wide range of linguistic phenomena found in natural language, and
Public leaderboard: For tracking performance on the benchmark and a dashboard for visualizing the performance of models on the diagnostic set.

Benchmark	Task	Dataset
GLUE	Natural language understanding	GLUE benchmark dataset
SQuAD	Question answering	SQuAD dataset