NLP LeaderBorad - simon-oz/Weekly-AI-news GitHub Wiki

AlpacaEval - An Automatic Evaluator for Instruction-following Language Models

GLUE & SuperGLUE - NLP tasks evaluation ranking

HumanEval Code - Code Generation on HumanEval

LMSYS Leadborad - Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings

Open LLM Leaderboard - aims to track, rank and evaluate LLMs and chatbots as they are released

LLM Safety Leaderboard - a unified evaluation for LLM safety and help researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs.