GLUE - AshokBhat/ml GitHub Wiki
About
- General Language Understanding Evaluation (GLUE) benchmark
- A collection of resources for training, evaluating, and analyzing natural language understanding systems.
Components
- Benchmark: Nine sentence- or sentence-pair language understanding tasks built on established existing datasets and selected to cover a diverse range of dataset sizes, text genres, and degrees of difficulty,
- Diagnostic dataset: Designed to evaluate and analyze model performance with respect to a wide range of linguistic phenomena found in natural language, and
- Public leaderboard: For tracking performance on the benchmark and a dashboard for visualizing the performance of models on the diagnostic set.
GLUE vs SQUAD
Benchmark | Task | Dataset |
---|---|---|
GLUE | Natural language understanding | GLUE benchmark dataset |
SQuAD | Question answering | SQuAD dataset |