Open AI Agent - davidmarsoni/llog GitHub Wiki

Open AI

OpenAI has recently released a new set of tools to build agentic applications. This new set of tools is called the OpenAI Agent System. It is a framework that allows you to build applications that can perform tasks autonomously, using LLMs and other tools.

:sparkles: Features

The OpenAI Agent System offers a variety of features to enhance your agentic applications:

Responses API: OpenAI's new API primitive for leveraging built-in tools to build agents
Multimodal capabilities: Support for text, images, and other data types
Built-in tools: Web Search, File Search, Computer Use tools for various tasks
Agents SDK: Open-source tool for orchestrating multi-agent workflows
Observability tools: Tools for tracing and inspecting agent workflow execution

:inbox_tray: Responses API

The Responses API is OpenAI's new API primitive for leveraging built-in tools to build agents. It combines the simplicity of Chat Completions with the tool-use capabilities of the Assistants API, providing a more flexible foundation for developing agentic applications.

:key: Key Features

Unified item-based design: Consistent interface for all response types
Simpler polymorphism: Easier handling of different response types
Intuitive streaming events: Real-time processing of model outputs
SDK helpers: Utilities like response.output_text to easily access the model's text output
Multi-tool support: Ability to use multiple tools and model turns in a single API call

The Responses API allows developers to integrate OpenAI models and built-in tools into their applications without the complexity of connecting multiple APIs or external applications. It also facilitates data storage on OpenAI for performance evaluation using features like tracing and evaluations.

:computer: Example Implementation

To help you get started with the Responses API, here's a simple IPYNB notebook example:

Basic example of using the Responses API

:toolbox: Built-in Tools

The OpenAI Agent System provides several built-in tools that enable agents to perform various tasks efficiently.

:mag: Web Search

Web search is available as a tool in the Responses API when using GPT-4o and GPT-4o-mini. It provides up-to-date answers with clear citations from the web.

:star: Features

Real-time information retrieval: Access to current information from across the web
Clear and relevant citations: Properly attributed sources for all information
Tool integration: Can be paired with other tools or function calls
Powerful search capability: Powered by the same model used for ChatGPT search
High accuracy: 90% for GPT-4o search and 88% for GPT-4o mini search on SimpleQA benchmark

:file_folder: File Search

The file search tool allows developers to retrieve relevant information from large volumes of documents.

:star: Features

Multiple file type support: Work with various document formats
Query optimization: Intelligent query processing for better results
Metadata filtering: Filter search results based on document metadata
Custom reranking: Prioritize results based on custom criteria
Fast retrieval: Fast, accurate search results

:computer: Computer Use

The computer use tool enables developers to build agents capable of completing tasks on a computer, powered by the Computer-Using Agent (CUA) model that enables Operator.

:star: Features

Action capture: Captures mouse and keyboard actions generated by the model
Task automation: Automates computer use tasks by translating actions into executable commands
Benchmark performance: Sets new state-of-the-art records in computer use benchmarks
Safety mechanisms: Built-in safety features to guard against misuses

:computer: Example Implementation

Here's a simple example of how to use the Computer Use tool:

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="computer-use-preview",
    tools=[{
        "type": "computer_use_preview",
        "display_width": 1024,
        "display_height": 768,
        "environment": "browser" # other possible values: "mac", "windows", "ubuntu"
    }],
    input=[
        {
            "role": "user",
            "content": "Check the latest OpenAI news on bing.com."
        }
        # Optional: include a screenshot of the initial state of the environment
        # {
        #     type: "input_image",
        #     image_url: f"data:image/png;base64,{screenshot_base64}"
        # }
    ],
    reasoning={
        "generate_summary": "concise",
    },
    truncation="auto"
)

print(response.output)

This example has been copied from the official OpenAI documentation as we were unable to test it on our side due to the tier limitations of this feature.

[!IMPORTANT] You need to be a Tier 3 user, which corresponds to a 100 CHF investment in OpenAI API usage.

Here is the official link to the example: Computer Use Example

:package: Agents SDK

The Agents SDK is an open-source tool that simplifies orchestrating multi-agent workflows. It offers significant improvements over the experimental Swarm SDK released previously.

:key: Key Features

Agents: Easily configurable LLMs with clear instructions and built-in tools
Handoffs: Intelligent transfer of control between agents
Guardrails: Configurable safety checks for input and output validation
Tracing & Observability: Visualization of agent execution traces for debugging and optimization

:computer: Example Implementation

Here is an IPYNB notebook example of how to use the Agents SDK:

Multi-agent workflow using Agents SDK

:telescope: Observability Tools

The OpenAI Agent System includes integrated observability tools to help developers trace and inspect agent workflow execution. These tools enable developers to:

Debug and Optimize: Identify performance bottlenecks and improve agent behavior
Evaluate performance: Assess agent effectiveness on various tasks
Monitor for issues: Track potential problems in agent operations
Analyze decision-making: Gain insights into how agents reach conclusions and take actions

:left_right_arrow: Comparison with LlamaIndex

OpenAI's Agent System and LlamaIndex are both powerful frameworks for building agentic applications. However, they have different focuses and strengths.

OpenAI's Agent System is more focused on providing a comprehensive set of built-in tools and features for building agents, while LlamaIndex is more focused on providing a framework for contextualizing LLMs with data sources.

As OpenAI's Agent System is still in its early stages, it may not have all the features and capabilities of LlamaIndex. However, it is a promising framework that has the potential to become a powerful tool for building agentic applications as it offers a lot of interesting features such as multimodal capabilities and built-in tools, guardrails, and observability tools.

:checkered_flag: Conclusion

As we have not dedicated a lot of time to test the OpenAI Agent System, we are not able to provide a complete comparison between the two frameworks. However, we believe that both frameworks could have been used to build our project.

We just managed to see glimpses of the OpenAI Agent System, as it has a lot of features and tools that are really interesting and could be used to build complex applications.

We deeply regret not being able to test OpenAI's solution due to tier restrictions but are confident that it is a strong alternative from a market leader.

:books: Resources

For more information and resources, check out the following links: