1MEIC07T2: UTA ‐ Unit Testing Assistant - FEUP-MEIC-DS-2024-25/ai4sd GitHub Wiki

UTA is a software which aims to facilitate the process of unit testing with the help of an AI assistant.

Vision

Our software provides various ways to facilitate working on unit testing.

  • Firstly, the system provides an option to evaluate the current unit tests, and try to improve them or even add extra ones. These suggestions can be implemented in the files by the assistant itself.
  • Secondly, when the user doesn't have ideas on how to test something, there's the option of asking the assistant what types of unit tests to apply on a file. From the selected aspects, the user will get unit tests covering those and can choose which ones to implement.
  • Finally, the user will be able to freely chat with the assistant and ask any specific questions that might not be possible through the other options.

Any additional features added should be in favor of making these 3 bigger features, and facilitating even more the user experience. For instance, being able to send files in the chatbox is something that would facilitate the process for the user. After making the 3 main features functional, our main goal will be to find and develop quality of life improvements such as that one.

Research

The following were the project's primary research areas:

  • Web Frameworks: Because of its ease of use and interoperability with Python applications, Flask was selected as the backend framework.
  • Environment management: To handle API keys securely, use the dotenv package.
  • Front-end technologies: Using JavaScript, HTML, and CSS to create an intuitive user interface.

The team also discovered that, when using similar products, the user often needs to have a good idea of what types of unit tests they want to do or give a detailed explanation of the context. Our tool aims to hold the hand of user's who are a bit more lost in this process by providing constructive evaluations and brainstorming suggestions on where to get started in the testing. Moreover, when speaking to most AI chatbots, usually the user has to copy and paste the suggested changes on to their files, our product aims to cover that process, cutting down even more time.

Domain Analysis

Class Diagram

Class Diagram

Activity Diagram

Activity Diagram

Architecture and design

The client-server model serves as the foundation for the architecture:

  • Client: An online interface that lets users put their code and initiate the creation of tests.
  • Server: A Flask application that responds to HTTP requests, interacts with the Gemini model, and provides the tests that have been generated.
  • AI integration: The backend and the Gemini model are linked by the google.generativeai library. Architecture Diagram

Technologies

The statements below are subjected to change in the future.

  • Routes and API interactions are managed by the Flask application.
  • Gemini API integration: Uses a secure API key to configure generative AI.
  • Simple interface with a text field and a button to transfer code is known as the front-end interface.
  • Prompt building: Provides the AI with customised prompts to direct the creation of pertinent tests.
  • HTTP POST request: JavaScript uses the /generate_tests endpoint to submit the code to the Flask server.
  • API setup: The server initialises the Gemini model and loads the API key using dotenv.
  • AI response: After receiving the created tests, the server creates a prompt and transmits it to the Gemini model.
  • Client response: A JSON response containing the tests is sent back by the server.

Development guide

A new developer should have some basic programming skills in the Python language, as well as knowing how to work with Next.js for web development, since that will be the basis for the product to interact with the Gemini API. It's also crucial to understand how testing code works, more specifically how unit testing can improve a product. This guide should be expanded in future Sprints.

Security concerns

The team didn't focus too much on security, since they trust the new Avengers infrastructure to ensure a certain level of security. However, some measures may be added if needed.

Quality assurance

Quality assurance hasn't been the biggest priority during Sprint 0 and 1. During Sprint 2 and 3, since the focus was on integrating the code into Avengers and the Cloud Build, there was a lot more care put into assuring quality, since the assistant stopped existing in a vaccuum. It was crucial to do some testing in other to make sure UTA did not break any work from the other teams that shared the web app.

How to use

Below is the Figma prototype done during Sprint 0, which aims to illustrate how the user would take advantage of the 3 main functions of the product.

Evaluations (Prototype)

Evaluations

Suggestions (Prototype)

Suggestions

Chatting (Prototype)

Chatting

Running the code

1st Option

Follow these steps on your terminal:

  • Firstly, to run the backend, it's necessary to access superheroes/superhero-07-02 and execute
docker-compose up -d 
  • Secondly, to the run the frontend, it's necessary to access avengers/frontend/webapp and execute
docker-compose up -d 

2nd Option

Just click this link: http://104.155.4.93/assistants/uta

Retrospective for Sprint 0

During Sprint 0, most of the product's concept was concieved and a lot of brainstorming was done. Most of the work consisted on laying the groundwork for things to come, so the initial prototype came out in a very basic state, where users copy pasted some code, and then the AI would respond with the bare text for the unit tests.

Retrospective for Sprint 1

During Sprint 1, some much needed quality of life features were added. For instance, the code that the AI assistant provides is now indented with color for better understanding and it's also properly commented. In addition, instead of copy pasting code, the user can select their files directly from file explorer and can also directly save the result into a file on their computer. Besides that, work was done in order to integrate the product into the frontend, but this task is not fully completed yet. A video demonstrating of the code is available here.

Retrospective for Sprint 2

This sprint was marked by some milestones:

Merging with other teams

Since the team discovered other projects that shared a lot of features to UTA, it was decided to start working on "Project Madame Web". This consists in the fusion of teams 2 (UTA), 3 (TeXes, which focuses on test explanations) and 5 (Twister AI, which focuses on generating mutation tests) from class 7. This integration hasn't been executed yet, but it's planned for Sprint 3. This merger would allow for an intuitive page that handles both unit and mutation testing. This interface would offer a variety of features, from code/test evaluations, suggestions on areas to do more testing and explanations for what each test does, as well as conviently change these changes in the user's computer.

What went right?

Knowing the other 2 teams could take care of some features, the team was able to be more focused on it's main features and be more productive.

What went wrong?

The team faced several challenges in order to successfully integrate the product into Avengers AI4SD, since there was plenty of confusion around some of its tools. In addition, communication wasn't ideal between the 3 teams at first, which led to some misunderstandings on the plan for the product.

What plans does the team have for improving?

With most of the Avengers integration completed, the work for Sprint 3 should be a smoother experience. As for the cooperation with other teams, there are plans for more frequent communication in order to ensure that every time one team gets a feature done, everyone is aware of it, and no work is repeated or wasted.

Where can the work be seen?

A video demonstrating the current product can be found here.

Retrospective for Sprint 3

This sprint was marked by the deployment of the GCP Cloud Build and a lot of bux fixing for the web app.

Merging with other teams

Due to a lot of technical difficulties with the overall infrastructure, the merger of teams 7.2, 7.3 and 7.5 was not possible to be done before the end of the final sprint.

What went right?

The app is fully working as expected in the cloud built of AI4SD.

What went wrong?

In order to deliver UTA on time, some planned quality of life features had to be scrapped.

What plans does the team have for improving?

If the team were to keep working on this project, the scrapped features would definitely be added. This would include the ability to specifically choose which tests are saved to the computer, multiple chat tabs and of course, the merger with the other "Madame Web" assistants.

Where can the work be seen?

The final state of UTA can be observed in the GIF below: Final Demo To see this demo with a better image quality, there's a video that can be found here.

As stated before in the Running the Code section of this wiki, the code can also be executed through the main branch of AI4SD.

How to contribute

At this point of the semester the project should be done. The team was working on merging their work with teams 3 and 5 in class 7 for "Project Madame Web", so if someone thinks their product would fit well with this collaboration, they could try to merge these with their assistant if they wished. There aren't many rules to follow or requirements imposed, as long as it contributes to the overall goal of improving the experience of unit/mutation testing.

Contributions

Team 2