getting_started - Sidies/MasterThesis-HubLink GitHub Wiki
To get started with the project, you first need to clone the repository and install the required dependencies. This guide will walk you through the steps required to set up the project on your local machine.
This will walk you through the steps required to install and set up the project with a pyproject.toml
configuration.
This project requires Python 3.12 - 3.13 for use. Note: Because the current version of Microsoft GraphRag requires a Python version equal to or below 3.12, we recommend using Python 3.12 if you intend to work with Microsoft GraphRag. However, you can still run the code with Python 3.13 if you do not intend to work with Microsoft GraphRag as it is an optional dependency.
Large files are stored in Git LFS. If you don't have Git LFS installed, follow the installation instruction here to install it
After installing Git LFS, initialize it by running the following command in the terminal:
git lfs install
With an open terminal, clone the project’s repository by executing the following command:
git clone https://gitlab.kit.edu/kit/kastel/sdq/stud/abschlussarbeiten/masterarbeiten/marco-schneider/ma-schneider-implementation.git
Note: If you are using a windows system, you may encounter an error with the file paths being too long. If this is the case, you have too configure your Windows to allow long paths. This can be done by opening a Powershell with administrator rights and running the commands:
reg add "HKLM\SYSTEM\CurrentControlSet\Control\FileSystem" /v LongPathsEnabled /t REG_DWORD /d 1
andgit config --system core.longpaths true
. Restart your system and try the clone process again.
Once the cloning process is complete, navigate into the project directory:
cd ma-schneider-implementation
Then, pull the large files managed by Git LFS:
git lfs pull
You can optionally choose to run the project in a virtual environment. It is strongly recommended doing this if you have multiple python projects that you would like to run on your device. This short guide will show you how to get a virtual python environment running with pythons venv
module.
- Navigate your terminal into the
sqa-system
folder (where pyproject.toml is located).
cd sqa-system
- Create the virtual environment.
python -m venv venv
Now a new folder has been created in the projects root directory that includes the necessary files for the virtual environment. The folder should be called venv
. After the preparation for the environment is done, the environment needs to be activated.
- Activate the virtual environment.
Windows
venv\Scripts\activate
Linux
source venv/bin/activate
Your terminal should now show a (venv)
next to the path. If that is the case the virtual environment has been activated. If you encounter a terminal permission error take a look at this post or use CMD instead of powershell.
This project uses a pyproject.toml
file to manage its dependencies, you can install everything with a single pip command. Make sure your terminal is located in the sqa-system
directory (where pyproject.toml is located). Execute either of the following commands. It should then proceed to download and install all dependencies required to run the project.
Note: Because currently the Microsoft GraphRAG retriever is not working with the codecarbon package, you either have to choose to install the codecarbon package or the Microsoft GraphRAG retriever.
Install the following if you are interested to run the project without the Microsoft GraphRAG retriever:
pip install .[codecarbon]
If you plan to use Microsoft GraphRag (and you are on Python 3.12 or below), run the following command. You can also first run the above command to install with the codecarbon package, run the experiments, delete the codecarbon package and then run the following command to install with Microsoft GraphRAG. This ensures that the emission tracking is working for the experiments that are not using Microsoft GraphRAG.
pip install .[graphrag]
Note: It is possible that you may have to install gfortran to install the gensim package. On linux you can do this by running
sudo apt install build-essential gfortran python3-dev python3-pip
To run the experiments, the SQA system uses Weight & Biases (W&B) to track the experiments on the dashboard. You can create a free account on Weight & Biases.
After the account has been created, run the following command in the terminal to login:
wandb login
🥳 That's it! You can now use the SQA-system. To replicate the experiments read the tutorial here.