NLP LeaderBorad - simon-oz/Weekly-AI-news GitHub Wiki
AlpacaEval - An Automatic Evaluator for Instruction-following Language Models
GLUE & SuperGLUE - NLP tasks evaluation ranking
HumanEval Code - Code Generation on HumanEval
LMSYS Leadborad - Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings
Open LLM Leaderboard - aims to track, rank and evaluate LLMs and chatbots as they are released
LLM Safety Leaderboard - a unified evaluation for LLM safety and help researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs.