1MEIC06T1: ArchiDetect - FEUP-MEIC-DS-2024-25/ai4sd GitHub Wiki

Overview

The primary goal of this product is to develop an analysis tool that assesses the probability of specific architectural patterns existing within a software repository by examining various data points from GitHub. The end product will support decision-making by analyzing GitHub data, flagging possible patterns, and generating reports that highlight findings for developers, architects, and project managers. Below, you can find an index to quickly browse this page.

Vision

ArchiDetect empowers developers to swiftly uncover and analyze architectural patterns within their GitHub repositories, effortlessly integrating with their Visual Studio workflow. By integrating seamlessly with Visual Studio, we’re transforming architecture analysis from a complex, time-consuming process into a seamless part of the developer’s day-to-day.

With this tool, developers and stakeholders gain instant visibility into architectural trends and potential design challenges, allowing teams to align on a clear, data-backed understanding of a project’s needs. This means smarter, faster decisions that prevent issues before they escalate and help ensure every project is built on a solid architectural foundation.

For our team, this product represents a game-changing solution in a space currently underserved by existing tools. By automating architecture pattern detection and providing actionable insights, we’re helping development teams everywhere build stronger, more sustainable software—driving quality and innovation forward with every release.

Research

We found some projects that share similar purposes with ArchiDetect, namely:

CodeMaat, an open-source command-line tool, this product uses version control logs to produce data on coupling, complexity and module ownership.

  • Pros: Analyses author contributions, modules that change together (logical coupling) and change rate (churn), and other aspects of development to then offer helpful insight into code quality issues.
  • Cons: Doesn’t detect architectural patterns specifically.

CodeCharta, another open-source tool, this time focused on converting software metrics into interactive maps, offers a visualization of the codebase in a 3D cityscape format.

  • Pros: Helps with code maintainability and highlights large or complex areas that may be candidates for refactoring.
  • Cons: Doesn’t detect architectural patterns specifically.

SonarQube flags code issues, such as bugs, vulnerabilities, code smells, and it’s vastly used for continuous code quality and security analysis.

  • Pros: Supports architectural rule definitions and dependencies, helping to maintain structural quality.
  • Cons: Focuses only on static analysis of code and rule-based architectural checks, rather than identifying architectural patterns through repository activity and story points like ArchiDetect proposes to do.

Domain Analysis

Physical Diagram

Physical Diagram

After sprint 0, Nexus was introduced as a component of the AI4SD architecture, which is a source of repositories data. So, the application will retrieve the repository data from Nexus in order to avoid relying on the Github API:

Physical Diagram 2

Sequence Diagram

Sequence Diagram

Architecture and design

ArchiDetect, as described above, is a simple tool that follows a modular design.

  • The frontend, where the input data is inserted and the output data displayed.
  • The backend, responsible for gathering data on one's repository, communicating with Nexus, and building the prompts to send to the LLM.
  • Nexus, that scrapes data from one's repository
  • The LLMs, that will recognize architectural patterns from the backend's information on one's repository.

In the future, the app is expected to use several LLMs when recognizing architectural patterns, so it can cross-reference various results, making the conclusions more solid.

Moreover, the user will also be able to select which data is to be fed to the LLM, allowing them to filter out "noisy" data (for example, bad commit messages), in case they recognize it is a problem in one's repository.

Technologies

Our tool was developed using Python and Django for the backend.

The client gave us total control over which language and technologies to use, as long as the tool could be integrated in the AI4SD context, either as a web app or a Visual Studio extension. We decided on the latter, because we thought the user experience would be more seamless this way, having the tool's analysis right beside the codebase.

During the development of the prototype in Sprint 0, we decided to build a web based tool, After some objective changes within the project, we modified our tool to be an extension.

Security concerns

ArchiDetect is designed to be safe and is free of prompt injection vulnerabilities. To ensure this, we have implemented strict input validation and specified JSON schemas for communication between components. This structured approach ensures that only well-formed and expected data is processed, minimizing the risk of malicious input influencing system behavior.

Quality assurance

Continuous Integration/Continuous Deployment (CI/CD): Automated pipelines are configured using tools like GitHub Actions to enforce testing and quality gates before merging code.

Peer Code Reviews: All code changes undergo peer review to maintain coding standards and improve design.

User Feedback: Collecting user feedback during early releases (e.g., alpha/beta versions) to identify usability issues and enhance features.

How to use

ArchiDetect integrates with Visual Studio to provide a seamless experience. Here is a step-by-step guide for users:

Installation:

Download the ArchiDetect plugin from the Visual Studio Marketplace.

Install the plugin following the prompts.

Running an Analysis:

Open a project in Visual Studio,

Click "Run Analysis."

Viewing Results:

Process of Development

Sprint 0

Sprint 1 Retrospective

At the end of Sprint 1, it is the team's overall opinion that we worked well. However, some aspects should be improved to increase the agility of our process of development, such as clearly defining some tasks that aren't so easily perceived using user stories and assigning them to members of the team, beginning to work on the sprint backlog sooner as to review the work done more thoroughly.

At this point, there are still some doubts regarding the integration of the work developed by the teams with the AI4SD tool.

We decided to implement changes regarding the first aspect, so that Sprint 2 can begin with increased productivity among the team.

board_sprint1 board_sprint1_02

Boards at the end of Sprint 1

Sprint 2 Retrospective

In this Sprint, the team feels the planning was too optimistic, and, with more work pilling up from other courses, the time management was a little chaotic.

For Sprint 3, we are implementing team meetings more often, so there is a closer objective to work towards.

board_sprint2 board_sprint2_02

Boards at the end of Sprint 2

Sprint 3

This is how the workload of our team is looking like at the beginning of the last Sprint of this project. We noticed that the Product Backlog had an item that was already in progress (Issue #11), so we moved it into the "In Progress" board.

board_sprint3_init board_sprint3_init_02

Boards at the beginning of Sprint 3

Sprint 3 Retrospective

During the Sprint, we decided to catalog each remaining issue with its value (Must, Should, Could, and Would not), so we could easily assess how much work we had to accomplish. The only issues not completed are tagged with "Could" or "Would not".

As we end this sprint, the team agrees that it was the best one as of yet, with communication flowing between members like never before. The work was accomplished and brings value to AI4SD. Another improvement was how well time was managed from the beginning of the sprint, even with other projects at hand.

The team feels that there is still room to improve on following internal milestones, better workload division and time management.

We are excited to see where this project will venture next, and we expect to see great things come from this big team!

board_sprint3_final board_sprint3_final_02

Boards at the end of Sprint 3

Happiness Metrics

Here we have a table representing the level of happiness between team members in all 3 Sprints. The vertical axis has the member evaluating, and the horizontal axis the member who is being evaluated.

We use 🤠 as the best possible evaluation!

metrics

Contributions

Link to the factsheets of each team and of each team-member: