Home - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki
Welcome to the amos2025ss04-ai-driven-testing wiki!
Our project
How to use
User and Developer Guide For a quick start and for the correct setup. You may also want to check out specific sections for certain applications:
- How to start the Web-interface Frontend Tutorial on how to start it
- Ollama Setup Information Possible ways to use Ollama to run LLMs
- Possible Ways to run Ollama and Settings
How to work on this project
- Contributing
- How to be a Release Manager How to do the weekly release. This is relevant or any future team tat takes part in the AMOS course.
Structure
- Architecture Overview of the system architecture and component interactions.
- Backend API Design The structure receiving prompts, interacting with LLMs and returning the results to the client.
- CI Pipeline CI/CD Pipeline Overview over the continuous integration pipeline for the project.
- Execution Flow Design The sequential process in our project.
- Model Configuration Design How we handle models.
- Module System Design How we handle the extensions.
- Prompt Design How we send Prompts in compact and effective formats.
- Robot Framework Information about potential integration with the Robot Framework.
Models
A list of all language models evaluated for use in the project. We only chose the versions of models that are available under an open source license.
- DeepCoder
- DeepSeek‐Coder V1
- Google Gemma 3
- LLMs incompatibility with our project Overview of models that were evaluated but deemed unsuitable.
- Mistral AI
- OpenHermes 2.5
- Phi4‐Mini
- Phi‑4 Reasoning
- Qwen 2.5 Coder
- Qwen3
- Smollm2
- StarCoder
- StarCoder2
- TinyLlama
Our research and experiments
- Docker Performance Performance of different docker configurations when running LLMs
- Evaluating Large Language Model Responses to Spelling Errors
- LLM Code Understanding Evaluation
- Comparison of 1b, 3b, 7b and 14b models
- Running AI LLM Projects in CI
LLM components and assessment tools
- AI‐Model-Benchmark Standard Benchmarks used to evaluate LLMs
- Benefits of chaining LLMs
- Code Complexity Description of the code complexity metrics that will be used for evaluation, including MCC and CCC.
- Code Coverage
- Include a project as context This explains how to include repositories into the LLM promt as context using a RAG and the alternatives.
- Iterative Refinement (Multi‐Pass Generation) Passing the response as the new prompt input.
Training an LLM by yourself
- How to train a LLM
- How to run on the FAU HPC
- Choosing the Right Dataset for LLM Training on the University HPC
- How to finetune a LLM
The maintenance of this Wiki officially stops at 16.07.2025, as this is the Demo-day of our project and no further contribution is expected from the team.