OpenHands:SWE Bench - chunhualiao/public-docs GitHub Wiki

OpenHands /SWE-Bench

Official docker!

https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation/benchmarks/swe_bench

https://github.com/All-Hands-AI/OpenHands/issues/6045 Installation problem for developers.

how to run SWE-Bench using OpenHands CodeAct 2.1?

To run SWE-Bench using OpenHands CodeAct 2.1, follow these steps:

  1. Install OpenHands and its dependencies:

    • Set up Docker on your system
    • Install Python
    • Clone the OpenHands repository: git clone https://github.com/All-Hands-AI/OpenHands[1][5]
  2. Configure the environment:

    • Install the required dependencies
    • Set up your chosen provider (e.g., Claude, Gemini, or X-AI)[1]
  3. Use OpenHands' remote runtime feature to parallelize evaluation:

    • This allows for efficient execution of unit tests and agent actions[4]
  4. Implement your agent logic in the predict function:

    def predict(inputs: dict):
        # Insert your OpenHands CodeAct 2.1 agent logic here
        return {"instance_id": inputs['instance_id'], "model_patch": "Your generated patch", "model_name_or_path": "OpenHands-CodeAct-2.1"}
    
  5. Run the evaluation using the SWE-Bench dataset:

    • Load the SWE-Bench dataset into LangSmith or use the provided CSV file
    • Use the evaluate function to generate predictions for the dataset[6]
  6. Execute the SWE-Bench evaluation in Docker:

    • Set up Docker images for parallel execution
    • Use the run_instances function from the SWE-Bench evaluation script to run the predictions[6]
  7. Convert the evaluation results to LangSmith feedback format for analysis[6]

By following these steps, you can run SWE-Bench using OpenHands CodeAct 2.1 and evaluate its performance on real-world software engineering tasks.

Citations: