OpenHands - runtimerevolution/labs GitHub Wiki
OpenHands is an open-source AI-powered platform designed to assist in software development.
Usage
There are multiple ways OpenHands can be used:
To evaluate OpenHands' usability, we tested its functionality through GitHub Actions.
Models
The models considered for our tests were the following:
- GPT-4o (OpenAI)
- Claude 3.5 Sonnet (Anthropic)
Both models are recommended in the OpenHands documentation, with Claude 3.5 Sonnet being the top recommendation
The same set of global instructions and test case descriptions was used for each model. Additional requests to OpenHands were based on each model's outputs, leading to variations in the instructions given to OpenHands.
OpenHands global instructions
When using the OpenHands GitHub Action, we can define global instructions for the AI agent by writing them in a file named .openhands_instructions, located in the project's root directory.
In both test cases we present, the instructions remain the same, except for the addition of an 'Overview' section, where we provide a brief description of the project.
### Instructions for OpenHands AI Agent
1. **Setup Development Environment**: Run `poetry install --no-root` to install the required packages for the project before making any changes. Ensure all dependencies are properly installed.
2. **Understand the Problem Statement**: Carefully read the problem statement provided in the issue description.
3. **Identify the Files to Modify**: Determine which files need to be created or modified to address the issue.
4. **Implement the Solution**: Make the necessary code changes to solve the problem. Run the command `make migrations` to create the necessary database migration files.
5. **Describe Changes**: Clearly describe the changes made to solve each step of the issue.
6. **Documentation**: Update any relevant documentation to reflect the changes made.
7. **Completion**: Once the issue is resolved, mark the task as complete.
8. **Provide Evidence**: Include the file path and content of the created or modified files in the completion message to provide evidence of the changes made.
### Best Practices
- Maintain clear and concise code.
- Follow Python coding standards and best practices.
- Ensure all code is well-documented.
- Write meaningful commit messages.
### Testing
- Write appropriate tests to ensure the applied changes are covered. Place these tests in the photo/tests directory.
- Testing is not required as part of the workflow. Write the tests but skip executing them.
### Final Steps
- After completing the changes, finalize the task by marking it as complete.
- Provide evidence of the changes by including the file path and content of the created or modified files in the completion message.
Test cases
Test case #1 - Implementing new features on an existing repository
In this test case, we used OpenHands to add new features to Revent, a photo contest API built with Django and the Strawberry GraphQL library.
The goal of this test case was not only to understand how OpenHands works but also to evaluate its performance with the previously referenced LLMs in terms of response quality.
Here, we present two of these functionalities, their corresponding GitHub issues, and the responses from each model.
Issue #1 - Add timestamps to models
Description: Add the date fields created_at
and updated_at
to all models. Both fields must be nullable.
Pull requests:
- GPT: https://github.com/runtimerevolution/revent-api/issues/173
- Claude: https://github.com/runtimerevolution/revent-api/issues/184
Issue #2 - Create winners view
We need to create a new GraphQL view that returns the winners of each contest held.
The response must be a set of contests with at least one winner, with each contest containing the list of winners. With which winner we also want the contest submission that the user won the contest with, which must include the picture (name and file) and the number of votes received. Consider the following JSON representation of the expected result:
{
"title": string,
"description": string,
"prize": string,
"voting_draw_end": string,
"winners": [
{
"name_first": string,
"name_last": string,
"submission": {
"picture": {
"name": string,
"file": string,
}
"number_votes": int
}
}
]
}
The response must be ordered by the date the contest ended.
Pull requests:
- GPT: https://github.com/runtimerevolution/revent-api/pull/180 (Note: The last three commits were done using Claude)
- Claude: https://github.com/runtimerevolution/revent-api/pull/187
Conclusions
Overall, OpenHands produced better results when using Claude compared to GPT. The solutions generated with Claude required fewer iterations and were more accurate from the start. For example, while solving issue #2, GPT needed 51 commits to reach a working solution, whereas Claude achieved the same in just one commit.
Here are some of the errors that GPT caused with its changes that Claude never showed, with all of them being illustrated with examples from the pull request for the solution for issue #2:
- Missing imports. Example: In this commit OpenHands created new types
WinnerType
,WinnerSubmissionType
andWinnerPictureType
in thetypes.py
file and used them correctly on the new querywinners
in thequeries.py
file, but did not included them in the imports of that file. - Invalid indentation. Example: In this commit, the decorator for the
winners
query in thequeries.py
file has invalid indentation. - Invalid class hierarchy inside the same file. Example: In this commit OpenHands updated the type of the field
winners
to beList[WinnerType]
instead ofList[UserType]
. The classWinnerType
is in the same file asContestType
but it is placed below, which does not allowContestType
to reference it. - Invalid decorators. Example: GPT used the wrong decorator for marking a class as a GraphQL type, it should be
@strawberry.type
instead of@strawberry.django.type
, as it was in the first commit.
GPT also showed several issues interpreting requested changes, leading to incorrect modifications in the code. Examples:
- When asked to change the type of the field
winners
in theContestType
class, GPT modified the wrong class (instructions/changes). In comparison, Claude got it right on the first attempt (instructions/changes). - When trying to solve the invalid class hierarchy issue referenced previously, GPT had considerable issues resolving this issue. Examples:
- Example #1: Tried to move the
ContestType
class, but instead, it copied the classes thatContestType
was referencing, causing a class redefinition error. It also duplicated some fields ofContestType
. Instructions/Changes. - Example #2: When trying to fix the issue caused in example #1, OpenHands was asked to remove the duplicates while leaving the first definition of those classes. The result was removing the first definition, going against what was explicitly asked. Also, the query was renamed for unknown reasons. Instructions/Changes.
- Example #3: Alternative approach to what we tried in example #1 but same output. It also broke the
ContestType
class. Instructions/Changes.
- Example #1: Tried to move the
- When asked to rollback the last commit, it added content to the
README.md
file for unknown reasons. Instructions/Changes. For reference, The commit we asked to rollback. - When asked to remove a query was duplicated and also badly implemented, instead it removed it and created again. Instructions/Changes.
Test case #2 - Start a project from scratch
In this test case, we used OpenHands to build a user authentication and management system API from scratch using the Django framework. We exclusively utilised the Claude LLM for this project. The goal was to assess OpenHands' capability to initiate a project from the ground up and its effectiveness in addressing larger-scale problems. The results and challenges encountered during this test are detailed in the repository, which can be found here.
Conclusions
At the end of the test case, we successfully started a project from scratch and achieved the following:
- Project setup
- Database model design and implementation
- Views creation
- Documentation (
README.md
,Swagger
) - GitHub workflow for automated testing
However, we encountered a couple of limitations:
- OpenHands was unable to create
__init__.py
files when creating new directories. When instructed, OpenHands recognized the issue and attempted to add the files but encountered an error when committing the changes. - Although OpenHands successfully created a
Makefile
, it did so with incorrect indentation. Despite being provided specific instructions on how to fix this, OpenHands could not correct the indentation. Additionally, when asked to add new commands to theMakefile
, OpenHands generated the correct response but failed to update the file itself.