libraries_used - Sidies/MasterThesis-HubLink GitHub Wiki

On this wiki page, we are giving an overview of the libraries that were used in the development of the Scholarly Question Answering (SQA) system.

Library Installation Usage Reference
langchain pip install langchain Provides a framework for accessing language models and embeddings. It is also used for the creation of the RAG pipeline. https://pypi.org/project/langchain/
langchain-chroma pip install langchain-chroma For the integration of the Chroma vector store. https://pypi.org/project/langchain-chroma/
langchain-community pip install langchain-community A third party langchain integration which we use to access cost and token information for OpenAI-Api models. https://pypi.org/project/langchain-community/
langchain-core pip install langchain-core Provides access to the underlying objects of the langchain framework which we use to adapt the code to our framework. https://pypi.org/project/langchain-core/
langchain-huggingface pip install langchain-huggingface We use this library to access Large Language Models and Embedding Models from Huggingface. https://pypi.org/project/langchain-huggingface/
langchain-google-genai pip install langchain-google-genai Provides access to the Google GenAI API. https://pypi.org/project/langchain-google-genai/
langchain-openai pip install langchain-openai Provides access to the OpenAI API. https://pypi.org/project/langchain-openai/
langchain-text-splitters pip install langchain-text-splitters Used for importing methods to split text into chunks. https://pypi.org/project/langchain-text-splitters/
langchain-ollama pip install langchain-ollama Provides access to using the OLLAMA API via Langchain integrations. https://pypi.org/project/langchain-ollama/
ollama pip install ollama Provides direct access to the OLLAMA API for running local large language models. https://pypi.org/project/ollama/
pandas pip install pandas Provides multiple tools for data manipulation and analysis. Most importantly it provides DataFrame objects which we use for efficient data handling. https://pypi.org/project/pandas/
nltk pip install nltk A natural language toolkit that offers a suite of tools for text processing. We use it for the calculation of the BLEU score. https://pypi.org/project/nltk/
ragas pip install ragas Provides a set of metrics to evaluate the performance of the RAG pipeline. https://pypi.org/project/ragas/
numpy pip install numpy A fundamental python package used for numerical computation which we use for various such tasks. https://pypi.org/project/numpy/
sentence-transformers pip install sentence-transformers Used for dense vector representations using transformer models. The library is a requirement for huggingface models. https://pypi.org/project/sentence-transformers/
inquirerpy pip install inquirerpy A Python package which provides tools for interactive command line interfaces. We use it for the creation of the CLI application. https://pypi.org/project/inquirerpy/
pytest pip install pytest A testing framework that we use to test the codebase. https://pypi.org/project/pytest/
typer pip install typer A library for building CLI applications. As such we use it for the creation of the CLI application. https://pypi.org/project/typer/
cryptography pip install cryptography Provides a toolkit for encryption and decryption. We use it for encypting and decrypting the API keys and passwords. https://pypi.org/project/cryptography/
orkg pip install orkg The official python package for the Open Research Knowledge Graph (ORKG) API. https://pypi.org/project/orkg/
seaborn pip install seaborn A data visualization library which we use to create plots for the evaluation of the experiments. https://pypi.org/project/seaborn/
matplotlib pip install matplotlib A plotting library which we use to create plots for the evaluation of the experiments. https://pypi.org/project/matplotlib/
weave pip install weave A tracking toolkit for Large Language Model based QA systems. We use it as an additional way of tracking the metrics and results of the experiments in addition to our local tracking. https://pypi.org/project/weave/
tqdm pip install tqdm A library that provides a progress bar for loops which is used in our project to provide feedback on the progress in the sqa system. https://pypi.org/project/tqdm/
sparqlwrapper pip install sparqlwrapper A wrapper for SparQL Endpoint which provides multiple fields and functions when using SparQL queries. As we are working with RDF graphs that are queried with SparQL, we use this library to support this access. https://pypi.org/project/sparqlwrapper/
pybtex pip install pybtex A library for managing BibTeX files. We use it for loading in data from BibTeX files. https://pypi.org/project/pybtex/
language_tool_python pip install language_tool_python An easy to use library for grammar and spell checking. It is applied in the qa generation process. https://pypi.org/project/language_tool_python/
ipywidgets pip install ipywidgets A library for the creation of interactive widgets in jupyter. It is a requirement for displaying certain output of other libraries. https://pypi.org/project/ipywidgets/
pylatexenc pip install pylatexenc A library for parsing and converting LaTeX content which we use for formatting purpose as some of our importing data is stored in LaTeX format. https://pypi.org/project/pylatexenc/
sacrebleu pip install sacrebleu A library for the calculation of the BLEU score. https://pypi.org/project/sacrebleu/
rouge_score pip install rouge_score A library for the calculation of the ROUGE score. https://pypi.org/project/rouge_score/
lightrag-hku pip install lightrag-hku An implementation of the LightRAG retriever which we are using as a Baseline. https://pypi.org/project/lightrag-hku/
bitsandbytes pip install bitsandbytes A library that provides optimizers and quantization tools for training large neural networks. We use it for loading local huggingface models. https://pypi.org/project/bitsandbytes/
accelerate pip install accelerate A library that simplifies distributed training and model acceleration which we are using for local huggingface models. https://pypi.org/project/accelerate/
evaluate pip install evaluate A library from huggingface for easily loading standardized metric functions. We use it to calculate the BERT scores. https://pypi.org/project/evaluate/
bert_score pip install bert_score Used for calculating BERTScore, a metric for evaluating text generation. This is a requirement for the Bert-Score-Evaluator. https://pypi.org/project/bert_score/
sentencepiece pip install sentencepiece An unsupervised text tokenizer and detokenizer used for subword segmentation. This library is a requirement for loading huggingface models. https://pypi.org/project/sentencepiece/
aioboto3 pip install aioboto3 A library that provides an asynchronous interface to AWS services. It is an installation requirement for the lightrag-hku library. Still we are not using AWS services. https://pypi.org/project/aioboto3/
nano_vectordb pip install nano_vectordb An implementation of a lightweight vector database which is used in the LightRAG retriever. As such it is a requirement for the lightrag-hku library. https://pypi.org/project/nano_vectordb/
pipmaster pip install pipmaster A utility that simplifies Python package management and dependency updates. It is a requirement for the lightrag-hku library. https://pypi.org/project/pipmaster/
rank-bm25 pip install rank-bm25 Provides an implementation of the BM25 ranking algorithm. This is a requirement for Think-on-Graph. https://pypi.org/project/rank-bm25/
chromadb-ops pip install chromadb-ops A tool for HNSW (Hierarchical Navigable Small World) index maintenance in ChromaDB. https://pypi.org/project/chromadb-ops/
graphrag pip install graphrag A library for building graph-based retrieval augmented generation systems. This is an optional dependency. https://pypi.org/project/graphrag/
codecarbon pip install codecarbon A library for tracking the carbon emissions, energy usage and more. We use it to track the sustainability of the retrievers. This is an optional dependency. https://pypi.org/project/codecarbon/
⚠️ **GitHub.com Fallback** ⚠️