Home - Sidies/MasterThesis-HubLink GitHub Wiki

Welcome to the Wiki of the SQA System. This Wiki is part of the Master Thesis HubLink: Leveraging Language Models for Enhanced Scholarly Information Retrieval on Research Knowledge Graphs and is intended to complement the information provided in the thesis. The Wiki provides a detailed overview of the Scholarly Question Answering (SQA) system, which is a Python framework developed to test and evaluate various retrieval approaches. The SQA system is designed to easily run different Retrieval Augmented Generation (RAG) pipeline configurations and evaluate the results. The Wiki is structured into different sections, which are described below.

Wiki Content

SQA System Overview

  1. Overall Architecture
  2. Capabilities
  3. Folder Structure and Descriptions

User Guide

  1. Getting Started
  2. Using the CLI
  3. Replicating Experiments

Advanced User Guide

  1. Creating a new Experiment
  2. Creating a QA Dataset

Developer Guide

  1. Libraries Used
  2. Adding a new Retriever
  3. Adding a new LLM or Embedding
  4. Adding a new Knowledge Graph
  5. Adding a new Dataset
  6. Adding a new Evaluator
  7. Adding a new Contribution to the ORKG

Miscellaneous

  1. Static Code Analysis & Supporting Tools

Where do I find? - A Navigation Guide to the Replication Package

Where do I find the HubLink implementation?

The implementation of HubLink is located in the retrieval folder of the SQA system. ../blob/experiments/sqa-system/sqa_system/retrieval/implementations/HubLink/

Where do I find the experiments?

The experiments are located in the experiments folder of the SQA system. ../blob/experiments/sqa-system/experiments/ There is a README.md file located at the top level of the folder that provides an overview of the experiments folder and where to find each of the experiments that we have conducted and analyzed in the master thesis.

How can I replicate the experiments?

We provide a detailed description of how to replicate the experiments in the README.md file located in the experiments folder. ../blob/experiments/sqa-system/experiments/README.md#replicating-the-experiments.

Where do I find the artifacts of the KGQA Retrieval Taxonomy?

We created the KGQA Retrieval Taxonomy by applying the our proposed taxonomy construction process which synthesizes taxonomy information from the literature and incrementally refines the taxonomy. The artifacts that document the construction process are located in the kgqa_retrieval_taxonomy folder of the assets/taxonomy_construction/ folder. ../blob/experiments/assets/taxonomy_construction/kgqa_retrieval_taxonomy/ You will also find README.md files in those folders that provide an overview of the artifacts and where to find them.

Where do I find the Artifacts of the Taxonomy Construction Process?

We provide template files that can be used to conduct the taxonomy construction process for the creation of a new taxonomy. The artifacts are located in the template_taxonomy_construction folder of the assets/taxonomy_construction/ folder. ../blob/experiments/assets/taxonomy_construction/template_taxonomy_construction/

Where do I find the generation of the KGQA Datasets?

The generation of the KGQA datasets is located in the experiments/qa_datasets folder. ../blob/experiments/sqa-system/experiments/qa_datasets/README.md There is a README.md file located at the top level of the folder that provides an overview of the generation process.

Where do I find the Question Templates used for the KGQA Dataset Generation?

The templates are provided in the templates.md file located in the experiments/qa_datasets folder. ../blob/experiments/sqa-system/experiments/qa_datasets/templates.md The file contains the templates that have been used to generate the question-answer pairs. The file also includes updated templates which we have corrected manually and LLM based to improve the quality of the questions.

Where do I find the selection of KGQA baseline approaches?

The selection of KGQA baseline approaches is documented in the assets/baseline_retriever_search folder. ../blob/experiments/assets/baseline_retriever_search/

I just want to try out the SQA system, where do I start?

If your intention is to just try out the functionality of the SQA system, we recommend running the HubLink approach in interactive mode on the ORKG with our prepared pipeline configuration. This can be done by following the instructions in the Using the CLI section of the wiki.

Contact

For any questions or feedback, feel free to contact me at [email protected]