GLUE - AshokBhat/ml GitHub Wiki

About

  • General Language Understanding Evaluation (GLUE) benchmark
  • A collection of resources for training, evaluating, and analyzing natural language understanding systems.

Components

  • Benchmark: Nine sentence- or sentence-pair language understanding tasks built on established existing datasets and selected to cover a diverse range of dataset sizes, text genres, and degrees of difficulty,
  • Diagnostic dataset: Designed to evaluate and analyze model performance with respect to a wide range of linguistic phenomena found in natural language, and
  • Public leaderboard: For tracking performance on the benchmark and a dashboard for visualizing the performance of models on the diagnostic set.

GLUE vs SQUAD

Benchmark Task Dataset
GLUE Natural language understanding GLUE benchmark dataset
SQuAD Question answering SQuAD dataset

See also