OpenHands:SWE Bench - chunhualiao/public-docs GitHub Wiki

Official docker!

https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation/benchmarks/swe_bench

https://github.com/All-Hands-AI/OpenHands/issues/6045 Installation problem for developers.

how to run SWE-Bench using OpenHands CodeAct 2.1?

To run SWE-Bench using OpenHands CodeAct 2.1, follow these steps:

Install OpenHands and its dependencies:
- Set up Docker on your system
- Install Python
- Clone the OpenHands repository: git clone https://github.com/All-Hands-AI/OpenHands[1][5]
Configure the environment:
- Install the required dependencies
- Set up your chosen provider (e.g., Claude, Gemini, or X-AI)[1]
Use OpenHands' remote runtime feature to parallelize evaluation:
- This allows for efficient execution of unit tests and agent actions[4]

Implement your agent logic in the predict function:

def predict(inputs: dict):
    # Insert your OpenHands CodeAct 2.1 agent logic here
    return {"instance_id": inputs['instance_id'], "model_patch": "Your generated patch", "model_name_or_path": "OpenHands-CodeAct-2.1"}

Run the evaluation using the SWE-Bench dataset:
- Load the SWE-Bench dataset into LangSmith or use the provided CSV file
- Use the evaluate function to generate predictions for the dataset[6]
Execute the SWE-Bench evaluation in Docker:
- Set up Docker images for parallel execution
- Use the run_instances function from the SWE-Bench evaluation script to run the predictions[6]
Convert the evaluation results to LangSmith feedback format for analysis[6]

By following these steps, you can run SWE-Bench using OpenHands CodeAct 2.1 and evaluate its performance on real-world software engineering tasks.

Citations: