replicating_experiments - Sidies/MasterThesis-HubLink GitHub Wiki

title: Replicating the Experiments

This page provides a step-by-step guide to replicate the experiments that were conducted as part of the master thesis.

Important Considerations

Stochasticity of Results The results exhibit some level of stochasticity, which means that for the same question, the contexts retrieved by the retriever can vary. This can be attributed to two phenomena. First, since we are working with LLMs, they exhibit a certain degree of randomness in their outputs. However, to keep this low, we set the temperature parameters to zero, which lowers the risk of different outputs. Second, some retrievers work with embedding models, which also have a certain degree of randomness when transforming text to a vector. Finally, the vectors are stored in the Chroma database which uses the HNSW indexing algorithm that is inherently non-deterministic to a certain degree. We observed that each time the database starts, the index is reinitialized, which can cause minor changes to vectors that are very close to each other in the nearest-neighbor query results compared to previous runs. Despite this, our analysis remains valid, as the randomness induces negligible variations that we do not see significant enough to impact the overall findings.

Preparing Ollama

Running the experiments, especially the parameter selection process, requires access to a local Ollama instance. To setup your local Ollama instance, please follow the instructions provided in the Ollama GitHub. Once you have Ollama ready, you can run the following command to download the models that we used for our experiments:

ollama pull qwen2.5:14b
ollama pull llama3.1
ollama pull mxbai-embed-large
ollama pull granite-embedding
ollama pull text-embedding-3-large

Now you will have all the models that are required to run the experiments.

Preparing the Data

Our experiments were conducted on the ORKG and have been added here. If you press this link, you will be redirected to the ORKG sandbox to the profile of the author. Under "Papers", there should be 153 papers listed, which are the ones we used for our experiments. If the papers are not listed or the list is incomplete, this is not an issue, as the SQA system can easily add the papers back.

However, there is one issue if the papers need to be added back. The ORKG distributes new IDs to each triple of the newly added papers. This means that the IDs of the golden triples in our KGQA Datasets ../blob/experiments/sqa-system/experiments/qa_datasets/qa_datasets/full need to be updated to reflect the new IDs. Specifically for this scenario, we have prepared scripts that can be used to update the IDs of the golden triples in the KGQA Datasets with the new IDs. These scripts have been directly incorporated into the SQA system meaning that executing a simple script will both, ensure that all the papers are added to the ORKG and that the IDs of the golden triples are updated.

This script is located in the sqa-system/experiments/prepare_data_for_replication.py file. To run the script, you need to have the SQA system installed. You can do this by following the instructions on the Getting Started page. Once you have the SQA system installed, you can run the script by executing the following command in your terminal:

Navigate your terminal to the sqa-system directory and run the following command:

python ./experiments/prepare_data_for_replication.py

This process takes some time. Now, the SQA system will check whether each of the papers that were used during our experimentation is present on the ORKG and ensure that the IDs of the golden triples in the KGQA datasets are updated to reflect the new IDs.

Note: If this is your first time interacting with the ORKG, you will be prompted in the terminal to enter your ORKG credentials. These are needed to be able to upload data to the ORKG.

Running the Experiments

Once finished, you can now run the experiments. Each experiment can be started with a corresponding run.py script. For this navigate into either the ./1_experiment or ./2_experiment directory and find the corresponding run.py script for the experiment that you want to run. All directories are similar to the following structure:

├── [Name of the Retriever]
│   ├── run_[name of the retriever].py
│   ├── ...

Therefore, you have to navigate into the directory of the retriever that you want to run and execute the run script over the terminal using

python run_[name of the retriever].py

This will start the experiment and the SQA system will take care of the rest. The results will be stored in the results folder of the retriever that you are running.

Note: If this is your first time running an experiment, you are likely prompted to enter your OpenAI Api key and your Weave API key.

To update the visualizations (tables and plots), you find in each experiment folder a visualize.py script. This script will automatically update the visualizations based on the results of the experiment. After the experiment is done, run this script to see the data updated.

Note: you can also move the visualize.py script directly into the folder of the experiment you just ran and execute it there. In this case, the visualizations are only based on the results of the experiment you just ran and not all previous experiments also located in the folder.

🥳 That's it!