Home - chunhualiao/public-docs GitHub Wiki

Leaderboards in AI

technology tree

value

articles, codes, presentations,
Automation

autonomous agents

AI Software Engineer

AI software engineer

AI scientist

AI entrepreneurs

AI entrepreneurs

AI politicians

digital twins for voters: direct democracy

Generalist agents

WebArena

What: A realistic benchmark that uses real websites (Wikipedia, GitHub, etc.) in a sandboxed environment.
Use: Tests whether AI agents can complete realistic information-seeking and manipulation tasks.
Why Important: Moves beyond toy tasks to real-world complexity.
Link: https://webarena.dev/