Home - chunhualiao/public-docs GitHub Wiki

Leaderboards in AI

technology tree

taxonomy

value

  • articles, codes, presentations,
  • Automation

autonomous agents

AI Software Engineer

AI software engineer

AI scientist

AI scientist

AI entrepreneurs

AI entrepreneurs

AI politicians

digital twins for voters: direct democracy

Generalist agents

WebArena

  • What: A realistic benchmark that uses real websites (Wikipedia, GitHub, etc.) in a sandboxed environment.
  • Use: Tests whether AI agents can complete realistic information-seeking and manipulation tasks.
  • Why Important: Moves beyond toy tasks to real-world complexity.
  • Link: https://webarena.dev/

Agent S