Home - chunhualiao/public-docs GitHub Wiki
Leaderboards in AI
- https://chat.lmsys.org/leaderboard
- https://livebench.ai/
- https://www.vellum.ai/llm-leaderboard
- https://artificialanalysis.ai/leaderboards/models
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
- https://scale.com/leaderboard
- https://leaderboard.allenai.org/
- https://mathvista.github.io/#leaderboard
value
- articles, codes, presentations,
- Automation
autonomous agents
AI Software Engineer
AI scientist
AI entrepreneurs
AI politicians
digital twins for voters: direct democracy
Generalist agents
WebArena
- What: A realistic benchmark that uses real websites (Wikipedia, GitHub, etc.) in a sandboxed environment.
- Use: Tests whether AI agents can complete realistic information-seeking and manipulation tasks.
- Why Important: Moves beyond toy tasks to real-world complexity.
- Link: https://webarena.dev/