agent - chunhualiao/public-docs GitHub Wiki

Agent = Perception + Decision-making (Planning) + Action/Tools + Memory + Goal-Oriented + Autonomy + Learning/Adaptability

general purpose agent

https://www.anthropic.com/engineering/built-multi-agent-research-system

related code release: https://github.com/anthropics/anthropic-cookbook/tree/main/patterns/agents

Each agent

needs an objective, an output format, guidance on the tools andsources to use, and clear task boundaries

access to memory
access to tools

Self Prompt Improvement

the Claude 4 models can be excellent prompt engineers. When given a prompt and a failure mode, they are able to diagnose why the agent is failing and suggest improvements.

We even created a tool-testing agent—when given a flawed MCP tool, it attempts to use the tool and then rewrites the tool description to avoid failures. By testing the tool dozens of times, this agent found key nuances and bugs. This process for improving tool ergonomics resulted in a 40% decrease in task completion time for future agents using the new description, because theywere able to avoid most mistakes.

Tools

We gave our agentsexplicit heuristics: for example, examine all available tools first, match toolusage to user intent, search the web for broad external exploration, or prefer specialized tools over generic ones.

MCP

Browser

OpenAI : Operator

Browser Use stands out for web form and browser interaction tasks, making it very strong for automating online activities.

Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows on a large number of websites, replacing brittle or unreliable automation solutions.

https://github.com/Skyvern-AI/skyvern

Integrated

AutoGPT AutoGPT stands as one of the earliest and most influential open-source autonomous agents. Released on March 30, 2023, by Toran Bruce Richards, this tool leverages OpenAI's GPT-4 or GPT-3.5 to perform tasks autonomously. What distinguishes AutoGPT is its ability to break complex goals into manageable sub-tasks without requiring user input at each step of the process. The system operates by having users define the agent's name, role, and objective, along with up to five strategies to achieve that objective. From that point, AutoGPT works independently to accomplish the goal through a series of self-directed actions. It can perform various tasks across the internet and local computing environments, such as researching information, generating content, and saving files—all with minimal supervision.

https://github.com/Significant-Gravitas/AutoGPT
internet browsing for information retrieval, and the ability to read and write files for tasks like summarization and document handling
resource-intensive, requiring considerable computational power, and
its autonomous behavior can sometimes lead to unpredictable actions .

BabyAGI presents a more lightweight implementation of autonomous concepts, designed to dynamically generate, prioritize, and execute tasks based on a single overarching objective. Its strengths lie in its objective-driven approach, dynamic task management, and ease of integration with APIs like Pinecone for enhanced functionality . Nevertheless, BabyAGI may struggle with highly complex tasks and relies on external services, which could incur additional costs . Its focus on simplicity makes it a good starting point for understanding autonomous agents, but its capabilities might be limited for intricate problems.

AgentGPT offers a unique approach by allowing users to deploy autonomous AI agents directly within a browser environment . These agents are assigned goals and iteratively attempt to achieve them, providing real-time feedback . A significant advantage is that it requires no installation and runs directly in the browser, offering customizable agent objectives and names . However, being browser-based imposes performance and capability constraints . The ease of access makes AgentGPT attractive for quick experimentation and simpler automation tasks, but its reliance on the browser environment might restrict its use for more demanding scenarios.

SuperAGI Framework for building autonomous agents with a focus on extensibility. Provides a full platform with GUI, multi-model support, memory (vector DB) integration, and plugins. Suitable for complex or long-running workflows.

coding

Codel: Emerging Full-Stack Automation Tool, Codel represents a newer entrant in the open-source autonomous agent landscape. Inspired by a proprietary tool called Devin, Codel aims to provide similar capabilities in an open-source package. It offers comprehensive automation across terminals, browsers, and code editors—making it particularly well-suited for programming tasks.

It is designed to perform complicated tasks and projects autonomously, utilizing the terminal, browser, and a built-in text editor . Codel operates securely within a sandboxed Docker environment, automatically detecting the next step required to complete a task . It features a built-in browser for fetching web information and a text editor for viewing modified files . All commands and their outputs are saved in a PostgreSQL database, and it can automatically select the appropriate Docker image based on the user's task . Codel is self-hosted and offers a modern user interface . This integrated design suggests a streamlined workflow for tasks requiring both web interaction and system-level commands.

Claude Code, an agentic coding tool from Anthropic, operates directly within the terminal, understanding project context and taking real actions . It assists with coding tasks through natural language commands, enabling users to edit files, fix bugs, answer questions about code, execute tests, and manage version control operations .

Designed for autonomous assistance with coding tasks within the terminal environment, Claude Code represents a new generation of terminal automation tools leveraging AI to understand code and perform complex development tasks autonomously . Its integration with version control systems and testing frameworks makes it a powerful tool for programming automation. However, its focus is primarily on coding-related activities within the terminal and relies on the Anthropic API.

References

https://g.co/gemini/share/bf558585ffcc