1MEIC06T1: ArchiDetect - FEUP-MEIC-DS-2024-25/ai4sd GitHub Wiki

Overview

The primary goal of this product is to develop an analysis tool that assesses the probability of specific architectural patterns existing within a software repository by examining various data points from GitHub. The end product will support decision-making by analyzing GitHub data, flagging possible patterns, and generating reports that highlight findings for developers, architects, and project managers. Below, you can find an index to quickly browse this page.

Vision

ArchiDetect empowers developers to swiftly uncover and analyze architectural patterns within their GitHub repositories, effortlessly integrating with their Visual Studio workflow. By integrating seamlessly with Visual Studio, we’re transforming architecture analysis from a complex, time-consuming process into a seamless part of the developer’s day-to-day.

With this tool, developers and stakeholders gain instant visibility into architectural trends and potential design challenges, allowing teams to align on a clear, data-backed understanding of a project’s needs. This means smarter, faster decisions that prevent issues before they escalate and help ensure every project is built on a solid architectural foundation.

For our team, this product represents a game-changing solution in a space currently underserved by existing tools. By automating architecture pattern detection and providing actionable insights, we’re helping development teams everywhere build stronger, more sustainable software—driving quality and innovation forward with every release.

Research

We found some projects that share similar purposes with ArchiDetect, namely:

CodeMaat, an open-source command-line tool, this product uses version control logs to produce data on coupling, complexity and module ownership.

  • Pros: Analyses author contributions, modules that change together (logical coupling) and change rate (churn), and other aspects of development to then offer helpful insight into code quality issues.
  • Cons: Doesn’t detect architectural patterns specifically.

CodeCharta, another open-source tool, this time focused on converting software metrics into interactive maps, offers a visualization of the codebase in a 3D cityscape format.

  • Pros: Helps with code maintainability and highlights large or complex areas that may be candidates for refactoring.
  • Cons: Doesn’t detect architectural patterns specifically.

SonarQube flags code issues, such as bugs, vulnerabilities, code smells, and it’s vastly used for continuous code quality and security analysis.

  • Pros: Supports architectural rule definitions and dependencies, helping to maintain structural quality.
  • Cons: Focuses only on static analysis of code and rule-based architectural checks, rather than identifying architectural patterns through repository activity and story points like ArchiDetect proposes to do.

Domain Analysis

Physical Diagram

Physical Diagram

After sprint 0, Nexus was introduced as a component of the AI4SD architecture, which is a source of repositories data. So, the application will retrieve the repository data from Nexus in order to avoid relying on the Github API:

Physical Diagram 2

Sequence Diagram

Sequence Diagram

Architecture and design

ArchiDetect, as described above, is a simple tool that follows a modular design.

  • The frontend, where the input data is inserted and the output data displayed.
  • The backend, responsible for gathering data on one's repository, communicating with Nexus, and building the prompts to send to the LLM.
  • Nexus, that scrapes data from one's repository
  • The LLMs, that will recognize architectural patterns from the backend's information on one's repository.

In the future, the app is expected to use several LLMs when recognizing architectural patterns, so it can cross-reference various results, making the conclusions more solid.

Moreover, the user will also be able to select which data is to be fed to the LLM, allowing them to filter out "noisy" data (for example, bad commit messages), in case they recognize it is a problem in one's repository.

Technologies

Identify the main technologies, languages and frameworks used. Clearly identify which ones were restrictions imposed by the client and which were your own choices. Justify your choices and explain in your own words the motivation for the restrictions of your client.

Explain the prototype or base implementation that you have implemented in Sprint 0, and how that has informed the rest of the development.

Development guide

Explain what a new developer to the project should know in order to develop the system, including who to build, run and test it in a development environment.

Document any APIs, formats and protocols needed for development (but don't forget that public APIs should also be accessible from the "How to use" above).

Describe coding conventions and other guidelines adopted by the team(s).

Security concerns

Identify potential security vulnerabilities classes and explain what the team has done to mitigate them.

Quality assurance

Describe which tools are used for quality assurance and link to relevant resources. Namely, provide access to reports for coverage and mutation analysis, static analysis, and other tools that may be used for QA.

How to use

Explain how to use your tool from an user standpoint. This can include short videos, screenshots, or API documentation, depending on what makes sense for your particular software and target users. If needed, link to external resources or additional markdown files with further details (please add them to this wiki).

Process of Development

Sprint 0

Sprint 1 Retrospective

At the end of Sprint 1, it is the team's overall opinion that we worked well. However, some aspects should be improved to increase the agility of our process of development, such as clearly defining some tasks that aren't so easily perceived using user stories and assigning them to members of the team, beginning to work on the sprint backlog sooner as to review the work done more thoroughly.

At this point, there are still some doubts regarding the integration of the work developed by the teams with the AI4SD tool.

We decided to implement changes regarding the first aspect, so that Sprint 2 can begin with increased productivity among the team.

board_sprint1 board_sprint1_02

Boards at the end of Sprint 1

Sprint 2 Retrospective

In this Sprint, the team feels the planning was too optimistic, and, with more work pilling up from other courses, the time management was a little chaotic.

For Sprint 3, we are implementing team meetings more often, so there is a closer objective to work towards.

board_sprint2 board_sprint2_02

Boards at the end of Sprint 2

Sprint 3

This is how the workload of our team is looking like at the beginning of the last Sprint of this project. We noticed that the Product Backlog had an item that was already in progress (Issue #11), so we moved it into the "In Progress" board.

board_sprint3_init board_sprint3_init_02

Boards at the beginning of Sprint 3

Happiness Metrics

Here we have a table representing the level of happiness between the team members. The vertical axis has the member evaluating, and the horizontal axis the member who is being evaluated.

We use 🤠 as the best possible evaluation!

metrics

How to contribute

Explain what a new developer should know in order to develop the tool, including how to build, run and test it in a development environment.

Defer technical details to the technical documentation below, which should include information and decisions on architectural, design and technical aspects of the tool.

Contributions

Link to the factsheets of each team and of each team-member: