deep research - chunhualiao/public-docs GitHub Wiki
https://gist.github.com/chunhualiao/c83b883b98ad11380227553cda7ff74c
- https://huggingface.co/spaces/gaia-benchmark/leaderboard Choices
- OpenAI's deep research: $200/month
- Gemini Advanced: 1.5 Pro with Deep Research
- Perplexity's deep research
- Genspark
open-source
- https://github.com/assafelovic/gpt-researcher
- https://chatgpt.com/share/67cca3d7-22a4-800d-9f4d-005996796702 2025, March 8
- https://github.com/HKUDS/Auto-Deep-Research
- Hugging Face's Open Deep Research
- https://github.com/dzhng/deep-research
- https://github.com/fdarkaou/open-deep-research
- https://www.analyticsvidhya.com/blog/2025/02/build-your-own-deep-research-agent/
- https://github.com/camel-ai/owl // latest
ChatGPT Pro: Deep Research
ChatGPT Pro subscribers can now use 120 Deep Research prompts per month[1][4]. This is an increase from the initial allocation of 100 queries per month when the feature was first introduced[4]. The increase in the monthly limit for Pro users aims to provide more opportunities for in-depth analyses and detailed report generation[1].
It's important to note that:
- A Deep Research task is counted as one complete report, not individual message interactions[5].
- The feature can take between 5 to 30 minutes to compile an answer, depending on the complexity of the question[2].
- Deep Research is currently very compute-intensive, which is why there are usage limits in place[2].
OpenAI has indicated that they are exploring ways to allow users to pay for compute more dynamically in the future, which could potentially affect how Deep Research queries are allocated[4].
Hugging Face's Open Deep Research
Hugging Face's Open Deep Research is an open-source AI research agent designed to browse the web and generate research reports. Below is a detailed, step-by-step explanation of how it works, using a concrete example from the GAIA benchmark:
Step 1: Define the Task
The agent is given a complex, multi-step question:
"Which of the fruits shown in the 2008 painting 'Embroidery from Uzbekistan' were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film 'The Last Voyage'? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o'clock position. Use the plural form of each fruit."
Step 2: Break Down the Task into Sub-tasks
The agent identifies the following steps required to answer the question:
- Identify the fruits in the painting (requires image processing or text-based data extraction).
- Determine which ocean liner was used in the film "The Last Voyage" (requires web search).
- Find the October 1949 breakfast menu for that ocean liner (requires accessing historical records or databases).
- Format the answer (list fruits in clockwise order, starting from the 12 o'clock position, in plural form).
Step 3: Use Tools and APIs
The agent leverages the following tools:
- Web Browser Tool: To search the internet for information about the film, the ocean liner, and its menu.
- Text Inspector: To extract text from web pages or documents (e.g., historical menus).
- Image Processing (if needed): To analyze the painting and identify fruits (though this might be simplified using pre-existing data).
Step 4: Execute Sub-tasks Sequentially
Sub-task 1: Identify the Painting and Fruits
- The agent searches for the painting "Embroidery from Uzbekistan" (2008) and extracts a list of fruits depicted (e.g., apples, pears, grapes).
- It may use a database of art metadata or web search results to confirm the fruits.
Sub-task 2: Find the Ocean Liner
- The agent searches for the film "The Last Voyage" and identifies the ocean liner used as a prop (e.g., the SS America).
Sub-task 3: Locate the 1949 Breakfast Menu
- The agent searches for historical records or archives of the ocean liner's 1949 breakfast menu.
- It may cross-reference multiple sources to ensure accuracy.
Sub-task 4: Cross-Reference and Format the Answer
- The agent matches the fruits from the painting with those listed in the 1949 menu.
- It formats the answer in the required clockwise order (e.g., "apples, pears, grapes").
Step 5: Synthesize and Output the Answer
The agent combines the results from all sub-tasks and generates the final answer:
"Apples, pears, grapes"
(assuming these fruits were both in the painting and on the menu).
Key Features of Open Deep Research
- Code Agent: Actions are written in code (not JSON), making the process more efficient and concise (e.g., 30% fewer steps).
- Agentic Framework: Guides the LLM to perform multi-step tasks (e.g., web search, data extraction, and synthesis).
- Open-Source Models: Uses OpenAI's models (e.g., GPT-4o, o1) via API but is designed to be compatible with open-weight models in the future.
- Operator Feature (in development): Will enable direct interaction with web browsers and applications (e.g., mouse/keyboard control).
Performance
On the GAIA benchmark, Open Deep Research achieved 55.15% accuracy, compared to OpenAI's Deep Research at 67.36%.
The gap is attributed to differences in browser interaction and model training (OpenAI uses a proprietary model, o3, optimized for web tasks).
Example Workflow Summary
- Input: Complex question with multiple dependencies.
- Sub-task Decomposition: Break the question into searchable components.
- Tool Execution: Use web search, text extraction, and data synthesis.
- Output: A formatted answer combining results from all steps.
This process demonstrates how Open Deep Research automates multi-step research tasks using open-source tools and agentic frameworks.