OpenHands - runtimerevolution/labs GitHub Wiki

OpenHands is an open-source AI-powered platform designed to assist in software development.

Usage

There are multiple ways OpenHands can be used:

To evaluate OpenHands' usability, we tested its functionality through GitHub Actions.

Models

The models considered for our tests were the following:

GPT-4o (OpenAI)
Claude 3.5 Sonnet (Anthropic)

Both models are recommended in the OpenHands documentation, with Claude 3.5 Sonnet being the top recommendation

The same set of global instructions and test case descriptions was used for each model. Additional requests to OpenHands were based on each model's outputs, leading to variations in the instructions given to OpenHands.

OpenHands global instructions

When using the OpenHands GitHub Action, we can define global instructions for the AI agent by writing them in a file named .openhands_instructions, located in the project's root directory.

In both test cases we present, the instructions remain the same, except for the addition of an 'Overview' section, where we provide a brief description of the project.

### Instructions for OpenHands AI Agent
1. **Setup Development Environment**: Run `poetry install --no-root` to install the required packages for the project before making any changes. Ensure all dependencies are properly installed.
2. **Understand the Problem Statement**: Carefully read the problem statement provided in the issue description.
3. **Identify the Files to Modify**: Determine which files need to be created or modified to address the issue.
4. **Implement the Solution**: Make the necessary code changes to solve the problem. Run the command `make migrations` to create the necessary database migration files.
5. **Describe Changes**: Clearly describe the changes made to solve each step of the issue.
6. **Documentation**: Update any relevant documentation to reflect the changes made.
7. **Completion**: Once the issue is resolved, mark the task as complete.
8. **Provide Evidence**: Include the file path and content of the created or modified files in the completion message to provide evidence of the changes made.

### Best Practices
- Maintain clear and concise code.
- Follow Python coding standards and best practices.
- Ensure all code is well-documented.
- Write meaningful commit messages.

### Testing
- Write appropriate tests to ensure the applied changes are covered. Place these tests in the photo/tests directory.
- Testing is not required as part of the workflow. Write the tests but skip executing them.

### Final Steps
- After completing the changes, finalize the task by marking it as complete.
- Provide evidence of the changes by including the file path and content of the created or modified files in the completion message.

Test cases

Test case #1 - Implementing new features on an existing repository

In this test case, we used OpenHands to add new features to Revent, a photo contest API built with Django and the Strawberry GraphQL library.

The goal of this test case was not only to understand how OpenHands works but also to evaluate its performance with the previously referenced LLMs in terms of response quality.

Here, we present two of these functionalities, their corresponding GitHub issues, and the responses from each model.

Issue #1 - Add timestamps to models

Description: Add the date fields created_at and updated_at to all models. Both fields must be nullable.

Pull requests:

GPT: https://github.com/runtimerevolution/revent-api/issues/173
Claude: https://github.com/runtimerevolution/revent-api/issues/184

Issue #2 - Create winners view

We need to create a new GraphQL view that returns the winners of each contest held.

The response must be a set of contests with at least one winner, with each contest containing the list of winners. With which winner we also want the contest submission that the user won the contest with, which must include the picture (name and file) and the number of votes received. Consider the following JSON representation of the expected result:

{
    "title": string,
    "description": string,
    "prize": string,
    "voting_draw_end": string,
    "winners": [
        {
            "name_first": string,
            "name_last": string,
            "submission": {
                "picture": {
                    "name": string,
                    "file": string,
                }
                "number_votes": int
            }
        }
    ]
}

The response must be ordered by the date the contest ended.

Pull requests:

GPT: https://github.com/runtimerevolution/revent-api/pull/180 (Note: The last three commits were done using Claude)
Claude: https://github.com/runtimerevolution/revent-api/pull/187

Conclusions

Overall, OpenHands produced better results when using Claude compared to GPT. The solutions generated with Claude required fewer iterations and were more accurate from the start. For example, while solving issue #2, GPT needed 51 commits to reach a working solution, whereas Claude achieved the same in just one commit.

Here are some of the errors that GPT caused with its changes that Claude never showed, with all of them being illustrated with examples from the pull request for the solution for issue #2:

Missing imports. Example: In this commit OpenHands created new types WinnerType, WinnerSubmissionType and WinnerPictureType in the types.py file and used them correctly on the new query winners in the queries.py file, but did not included them in the imports of that file.
Invalid indentation. Example: In this commit, the decorator for the winners query in the queries.py file has invalid indentation.
Invalid class hierarchy inside the same file. Example: In this commit OpenHands updated the type of the field winners to be List[WinnerType] instead of List[UserType]. The class WinnerType is in the same file as ContestType but it is placed below, which does not allow ContestType to reference it.
Invalid decorators. Example: GPT used the wrong decorator for marking a class as a GraphQL type, it should be @strawberry.type instead of @strawberry.django.type, as it was in the first commit.

GPT also showed several issues interpreting requested changes, leading to incorrect modifications in the code. Examples:

When asked to change the type of the field winners in the ContestType class, GPT modified the wrong class (instructions/changes). In comparison, Claude got it right on the first attempt (instructions/changes).
When trying to solve the invalid class hierarchy issue referenced previously, GPT had considerable issues resolving this issue. Examples:
- Example #1: Tried to move the ContestType class, but instead, it copied the classes that ContestType was referencing, causing a class redefinition error. It also duplicated some fields of ContestType. Instructions/Changes.
- Example #2: When trying to fix the issue caused in example #1, OpenHands was asked to remove the duplicates while leaving the first definition of those classes. The result was removing the first definition, going against what was explicitly asked. Also, the query was renamed for unknown reasons. Instructions/Changes.
- Example #3: Alternative approach to what we tried in example #1 but same output. It also broke the ContestType class. Instructions/Changes.
When asked to rollback the last commit, it added content to the README.md file for unknown reasons. Instructions/Changes. For reference, The commit we asked to rollback.
When asked to remove a query was duplicated and also badly implemented, instead it removed it and created again. Instructions/Changes.

Test case #2 - Start a project from scratch

In this test case, we used OpenHands to build a user authentication and management system API from scratch using the Django framework. We exclusively utilised the Claude LLM for this project. The goal was to assess OpenHands' capability to initiate a project from the ground up and its effectiveness in addressing larger-scale problems. The results and challenges encountered during this test are detailed in the repository, which can be found here.

Conclusions

At the end of the test case, we successfully started a project from scratch and achieved the following:

Project setup
Database model design and implementation
Views creation
Documentation (README.md, Swagger)
GitHub workflow for automated testing

However, we encountered a couple of limitations:

OpenHands was unable to create __init__.py files when creating new directories. When instructed, OpenHands recognized the issue and attempted to add the files but encountered an error when committing the changes.
Although OpenHands successfully created a Makefile, it did so with incorrect indentation. Despite being provided specific instructions on how to fix this, OpenHands could not correct the indentation. Additionally, when asked to add new commands to the Makefile, OpenHands generated the correct response but failed to update the file itself.

OpenHands - runtimerevolution/labs GitHub Wiki

Usage

Models

OpenHands global instructions

Test cases

Test case #1 - Implementing new features on an existing repository

Issue #1 - Add timestamps to models

Issue #2 - Create winners view

Conclusions

Test case #2 - Start a project from scratch

Conclusions

References