IMO25 - chunhualiao/public-docs GitHub Wiki

This code implements an automated IMO (International Mathematical Olympiad) problem solver using OpenAI's API. Here's a comprehensive explanation:

graph TD
    A[Start: Read Problem File] --> B[Generate Initial Solution]
    B --> C[Self-Improvement Step]
    C --> D[Verify Solution]
    D --> E{Verification Passes?}
    
    E -->|Yes| F[Increment Correct Count]
    E -->|No| G[Increment Error Count<br/>Reset Correct Count]
    
    F --> H{Correct Count >= 5?}
    G --> I{Error Count >= 10?}
    
    H -->|Yes| J[✅ SUCCESS: Return Solution]
    H -->|No| K[Continue Verification Loop]
    
    I -->|Yes| L[❌ FAILURE: Return None]
    I -->|No| M[Generate Corrected Solution<br/>Using Bug Report]
    
    K --> D
    M --> D
    
    N{Max Iterations<br/>Reached?} --> O[❌ TIMEOUT FAILURE]
    
    style J fill:#90EE90
    style L fill:#FFB6C1
    style O fill:#FFB6C1
    
    subgraph "Verification Process"
        V1[Extract Detailed Solution]
        V2[Send to Verification Prompt]
        V3[Get Verification Response]
        V4[Check if Solution is Correct]
        V5[Generate Bug Report if Needed]
        
        V1 --> V2 --> V3 --> V4 --> V5
    end
    
    subgraph "API Interaction"
        API1[Build Request Payload]
        API2[Send HTTP Request]
        API3[Extract Response Text]
        
        API1 --> API2 --> API3
    end

Overview

The script uses an iterative approach with AI to solve complex mathematical problems by generating solutions, verifying them, and refining them until a correct solution is found.

Key Components

1. Configuration & Setup

Uses OpenAI's o3 model (configured as "gpt-5")
Custom logging system that writes to both stdout and log files
API key management through environment variables

2. Core Prompts

The system uses several specialized prompts:

step1_prompt: Instructs the AI to provide rigorous, complete mathematical solutions
verification_system_prompt: Acts as an expert grader to verify solutions
self_improvement_prompt: Asks the AI to review and improve its solution
correction_prompt: Provides bug reports for solution refinement

3. Main Workflow## Key Functions Explained

API Interaction Layer

def build_request_payload(system_prompt, question_prompt, other_prompts=None)
def send_api_request(api_key, payload)
def extract_text_from_response(response_data)

These handle communication with OpenAI's o3 API, formatting requests and parsing responses.

Solution Verification System

def verify_solution(problem_statement, solution, verbose=True)

Extracts the detailed solution portion
Sends it to a verification prompt that acts as an expert IMO grader
Returns a bug report if issues are found
Uses a secondary check to determine if the solution is truly correct

Main Agent Function

def agent(problem_statement, other_prompts=[])

This orchestrates the entire solving process:

Initial Exploration: Generates first solution attempt
Iterative Refinement Loop (max 30 iterations):
- If verification fails: Generate corrected solution using bug report
- If verification passes: Increment success counter
- Success condition: 5 consecutive successful verifications
- Failure condition: 10 consecutive errors

Success Criteria

The system requires 5 consecutive successful verifications to ensure the solution is robust and not just lucky. This helps avoid false positives where a flawed solution might accidentally pass verification once.

Usage Example

python solver.py problem_statement.txt --log results.log --max_runs 10

Key Features

Rigorous Verification: Uses a separate AI instance as an "expert grader" to catch errors
Iterative Improvement: Automatically refines solutions based on feedback
Robust Success Detection: Requires multiple successful verifications
Comprehensive Logging: Tracks all interactions and timestamps
Error Handling: Graceful failure modes and retry mechanisms

The system essentially creates a feedback loop where an AI solver and AI verifier work together to produce mathematically rigorous solutions to complex problems.

IMO Problem Solver Prompts Reference

Prompt Name	Purpose	When Used	Key Requirements	Why Needed
`step1_prompt`	Main solution generation	Initial solution & corrections	- Rigor is paramount - Complete solutions only - TeX formatting - Structured output (Summary + Detailed Solution)	Establishes the foundational requirements for mathematical rigor. Ensures solutions aren't just correct answers but properly justified proofs.
`self_improvement_prompt`	Solution refinement	After initial solution generation	- Review and correct errors - Fill justification gaps - Follow system prompt structure	Gives the AI a chance to catch its own mistakes before external verification, improving solution quality.
`verification_system_prompt`	Solution verification/grading	Every solution verification cycle	- Act as IMO expert grader - Find all issues - Classify errors vs gaps - Step-by-step verification	Creates an independent "judge" to rigorously evaluate solutions, preventing the solver from being overly lenient with itself.
`verification_reminder`	Verification task context	Appended to verification requests	- Generate summary and verification log - Justify each step	Ensures the verification AI understands its specific task within the broader context.
`correction_prompt`	Bug report integration	When verification fails	- Address specific issues from bug report - Add explanations to avoid misunderstandings - Maintain system prompt compliance	Provides targeted feedback for improvement, allowing the solver to address specific weaknesses rather than starting over.
`check_verification_prompt`	Verification review	Post-verification assessment (commented out)	- Review findings for validity - Distinguish genuine flaws from concise arguments	Would help catch overly strict verification, but currently unused in the implementation.
Completeness Check	Solution validation	After solution generation	- Determine if solution claims completeness - Simple yes/no response	Ensures the system only proceeds with solutions that claim to be complete, filtering out partial attempts.
Correctness Check	Verification interpretation	After each verification	- Interpret verification results - Simple yes/no on solution correctness	Converts complex verification reports into binary decisions for the main logic loop.

Prompt Flow Architecture

Primary Generation Flow

step1_prompt → AI generates initial solution
self_improvement_prompt → AI refines its own work
Solution enters verification cycle

Verification & Correction Cycle

verification_system_prompt + verification_reminder → Independent AI evaluates solution
Correctness Check → Binary interpretation of verification
correction_prompt (if needed) → Targeted improvement using bug report
Repeat until success criteria met

Design Philosophy

Separation of Concerns: The system uses different prompts for different AI "roles" (solver vs. grader) to avoid conflicts of interest.

Iterative Refinement: Multiple prompts support a feedback loop where solutions are continuously improved based on rigorous evaluation.

Quality Gates: Various check prompts ensure only high-quality, complete solutions proceed through the pipeline.

Structured Output: Prompts enforce consistent formatting, making it easier to parse and process results programmatically.

Looking at the provided code, I can extract the following prompts that are used throughout the system:

Main System Prompts

1. step1_prompt - Primary Problem-Solving Instructions

### Core Instructions ###

*   **Rigor is Paramount:** Your primary goal is to produce a complete and rigorously justified solution. Every step in your solution must be logically sound and clearly explained. A correct final answer derived from flawed or incomplete reasoning is considered a failure.
*   **Honesty About Completeness:** If you cannot find a complete solution, you must **not** guess or create a solution that appears correct but contains hidden flaws or justification gaps. Instead, you should present only significant partial results that you can rigorously prove. A partial result is considered significant if it represents a substantial advancement toward a full solution. Examples include:
    *   Proving a key lemma.
    *   Fully resolving one or more cases within a logically sound case-based proof.
    *   Establishing a critical property of the mathematical objects in the problem.
    *   For an optimization problem, proving an upper or lower bound without proving that this bound is achievable.
*   **Use TeX for All Mathematics:** All mathematical variables, expressions, and relations must be enclosed in TeX delimiters (e.g., `Let $n$ be an integer.`).

### Output Format ###

Your response MUST be structured into the following sections, in this exact order.

**1. Summary**

Provide a concise overview of your findings. This section must contain two parts:

*   **a. Verdict:** State clearly whether you have found a complete solution or a partial solution.
    *   **For a complete solution:** State the final answer, e.g., "I have successfully solved the problem. The final answer is..."
    *   **For a partial solution:** State the main rigorous conclusion(s) you were able to prove, e.g., "I have not found a complete solution, but I have rigorously proven that..."
*   **b. Method Sketch:** Present a high-level, conceptual outline of your solution. This sketch should allow an expert to understand the logical flow of your argument without reading the full detail. It should include:
    *   A narrative of your overall strategy.
    *   The full and precise mathematical statements of any key lemmas or major intermediate results.
    *   If applicable, describe any key constructions or case splits that form the backbone of your argument.

**2. Detailed Solution**

Present the full, step-by-step mathematical proof. Each step must be logically justified and clearly explained. The level of detail should be sufficient for an expert to verify the correctness of your reasoning without needing to fill in any gaps. This section must contain ONLY the complete, rigorous proof, free of any internal commentary, alternative approaches, or failed attempts.

### Self-Correction Instruction ###

Before finalizing your output, carefully review your "Method Sketch" and "Detailed Solution" to ensure they are clean, rigorous, and strictly adhere to all instructions provided above. Verify that every statement contributes directly to the final, coherent mathematical argument.

2. self_improvement_prompt - Solution Refinement

You have an opportunity to improve your solution. Please review your solution carefully. Correct errors and fill justification gaps if any. Your second round of output should strictly follow the instructions in the system prompt.

3. check_verification_prompt - Verification Review

Can you carefully review each item in your list of findings? Are they valid or overly strict? An expert grader must be able to distinguish between a genuine flaw and a concise argument that is nonetheless sound, and to correct their own assessment when necessary.

If you feel that modifications to any item or its justification is necessary. Please produce a new list. In your final output, please directly start with **Summary** (no need to justify the new list).

4. correction_prompt - Error Correction Instructions

Below is the bug report. If you agree with certain item in it, can you improve your solution so that it is complete and rigorous? Note that the evaluator who generates the bug report can misunderstand your solution and thus make mistakes. If you do not agree with certain item in the bug report, please add some detailed explanations to avoid such misunderstanding. Your new solution should strictly follow the instructions in the system prompt.

5. verification_system_prompt - Solution Verification Instructions

You are an expert mathematician and a meticulous grader for an International Mathematical Olympiad (IMO) level exam. Your primary task is to rigorously verify the provided mathematical solution. A solution is to be judged correct **only if every step is rigorously justified.** A solution that arrives at a correct final answer through flawed reasoning, educated guesses, or with gaps in its arguments must be flagged as incorrect or incomplete.

### Instructions ###

**1. Core Instructions**
*   Your sole task is to find and report all issues in the provided solution. You must act as a **verifier**, NOT a solver. **Do NOT attempt to correct the errors or fill the gaps you find.**
*   You must perform a **step-by-step** check of the entire solution. This analysis will be presented in a **Detailed Verification Log**, where you justify your assessment of each step: for correct steps, a brief justification suffices; for steps with errors or gaps, you must provide a detailed explanation.

**2. How to Handle Issues in the Solution**
When you identify an issue in a step, you MUST first classify it into one of the following two categories and then follow the specified procedure.

*   **a. Critical Error:**
    This is any error that breaks the logical chain of the proof. This includes both **logical fallacies** (e.g., claiming that `A>B, C>D` implies `A-C>B-D`) and **factual errors** (e.g., a calculation error like `2+3=6`).
    *   **Procedure:**
        *   Explain the specific error and state that it **invalidates the current line of reasoning**.
        *   Do NOT check any further steps that rely on this error.
        *   You MUST, however, scan the rest of the solution to identify and verify any fully independent parts. For example, if a proof is split into multiple cases, an error in one case does not prevent you from checking the other cases.

*   **b. Justification Gap:**
    This is for steps where the conclusion may be correct, but the provided argument is incomplete, hand-wavy, or lacks sufficient rigor.
    *   **Procedure:**
        *   Explain the gap in the justification.
        *   State that you will **assume the step's conclusion is true** for the sake of argument.
        *   Then, proceed to verify all subsequent steps to check if the remainder of the argument is sound.

**3. Output Format**
Your response MUST be structured into two main sections: a **Summary** followed by the **Detailed Verification Log**.

*   **a. Summary**
    This section MUST be at the very beginning of your response. It must contain two components:
    *   **Final Verdict**: A single, clear sentence declaring the overall validity of the solution. For example: "The solution is correct," "The solution contains a Critical Error and is therefore invalid," or "The solution's approach is viable but contains several Justification Gaps."
    *   **List of Findings**: A bulleted list that summarizes **every** issue you discovered. For each finding, you must provide:
        *   **Location:** A direct quote of the key phrase or equation where the issue occurs.
        *   **Issue:** A brief description of the problem and its classification (**Critical Error** or **Justification Gap**).

*   **b. Detailed Verification Log**
    Following the summary, provide the full, step-by-step verification log as defined in the Core Instructions. When you refer to a specific part of the solution, **quote the relevant text** to make your reference clear before providing your detailed analysis of that part.

**Example of the Required Summary Format**
*This is a generic example to illustrate the required format. Your findings must be based on the actual solution provided below.*

**Final Verdict:** The solution is **invalid** because it contains a Critical Error.

**List of Findings:**
*   **Location:** "By interchanging the limit and the integral, we get..."
    *   **Issue:** Justification Gap - The solution interchanges a limit and an integral without providing justification, such as proving uniform convergence.
*   **Location:** "From $A > B$ and $C > D$, it follows that $A-C > B-D$"
    *   **Issue:** Critical Error - This step is a logical fallacy. Subtracting inequalities in this manner is not a valid mathematical operation.

6. verification_remider - Verification Task Reminder

### Verification Task Reminder ###

Your task is to act as an IMO grader. Now, generate the **summary** and the **step-by-step verification log** for the solution above. In your log, justify each correct step and explain in detail any errors or justification gaps you find, as specified in the instructions above.

Dynamic/Generated Prompts

7. Solution Completeness Check

check_complete_prompt = f"""
Is the following text claiming that the solution is complete?
==========================================================

{solution}

==========================================================

Response in exactly "yes" or "no". No other words.
"""

8. Verification Correctness Check

check_correctness = """Response in "yes" or "no". Is the following statement saying the solution is correct, or does not contain critical error or a major justification gap?""" \
        + "\n\n" + out

9. Verification Template

newst = f"""
======================================================================
### Problem ###

{problem_statement}

======================================================================
### Solution ###

{dsol}

{verification_remider}
"""

This code appears to be implementing an iterative mathematical problem-solving system that uses OpenAI's API to solve IMO-level problems with multiple rounds of verification and correction.