libraries_used - Sidies/MasterThesis-HubLink GitHub Wiki
On this wiki page, we are giving an overview of the libraries that were used in the development of the Scholarly Question Answering (SQA) system.
Library | Installation | Usage | Reference |
---|---|---|---|
langchain | pip install langchain |
Provides a framework for accessing language models and embeddings. It is also used for the creation of the RAG pipeline. | https://pypi.org/project/langchain/ |
langchain-chroma | pip install langchain-chroma |
For the integration of the Chroma vector store. | https://pypi.org/project/langchain-chroma/ |
langchain-community | pip install langchain-community |
A third party langchain integration which we use to access cost and token information for OpenAI-Api models. | https://pypi.org/project/langchain-community/ |
langchain-core | pip install langchain-core |
Provides access to the underlying objects of the langchain framework which we use to adapt the code to our framework. | https://pypi.org/project/langchain-core/ |
langchain-huggingface | pip install langchain-huggingface |
We use this library to access Large Language Models and Embedding Models from Huggingface. | https://pypi.org/project/langchain-huggingface/ |
langchain-google-genai | pip install langchain-google-genai |
Provides access to the Google GenAI API. | https://pypi.org/project/langchain-google-genai/ |
langchain-openai | pip install langchain-openai |
Provides access to the OpenAI API. | https://pypi.org/project/langchain-openai/ |
langchain-text-splitters | pip install langchain-text-splitters |
Used for importing methods to split text into chunks. | https://pypi.org/project/langchain-text-splitters/ |
langchain-ollama | pip install langchain-ollama |
Provides access to using the OLLAMA API via Langchain integrations. | https://pypi.org/project/langchain-ollama/ |
ollama | pip install ollama |
Provides direct access to the OLLAMA API for running local large language models. | https://pypi.org/project/ollama/ |
pandas | pip install pandas |
Provides multiple tools for data manipulation and analysis. Most importantly it provides DataFrame objects which we use for efficient data handling. | https://pypi.org/project/pandas/ |
nltk | pip install nltk |
A natural language toolkit that offers a suite of tools for text processing. We use it for the calculation of the BLEU score. | https://pypi.org/project/nltk/ |
ragas | pip install ragas |
Provides a set of metrics to evaluate the performance of the RAG pipeline. | https://pypi.org/project/ragas/ |
numpy | pip install numpy |
A fundamental python package used for numerical computation which we use for various such tasks. | https://pypi.org/project/numpy/ |
sentence-transformers | pip install sentence-transformers |
Used for dense vector representations using transformer models. The library is a requirement for huggingface models. | https://pypi.org/project/sentence-transformers/ |
inquirerpy | pip install inquirerpy |
A Python package which provides tools for interactive command line interfaces. We use it for the creation of the CLI application. | https://pypi.org/project/inquirerpy/ |
pytest | pip install pytest |
A testing framework that we use to test the codebase. | https://pypi.org/project/pytest/ |
typer | pip install typer |
A library for building CLI applications. As such we use it for the creation of the CLI application. | https://pypi.org/project/typer/ |
cryptography | pip install cryptography |
Provides a toolkit for encryption and decryption. We use it for encypting and decrypting the API keys and passwords. | https://pypi.org/project/cryptography/ |
orkg | pip install orkg |
The official python package for the Open Research Knowledge Graph (ORKG) API. | https://pypi.org/project/orkg/ |
seaborn | pip install seaborn |
A data visualization library which we use to create plots for the evaluation of the experiments. | https://pypi.org/project/seaborn/ |
matplotlib | pip install matplotlib |
A plotting library which we use to create plots for the evaluation of the experiments. | https://pypi.org/project/matplotlib/ |
weave | pip install weave |
A tracking toolkit for Large Language Model based QA systems. We use it as an additional way of tracking the metrics and results of the experiments in addition to our local tracking. | https://pypi.org/project/weave/ |
tqdm | pip install tqdm |
A library that provides a progress bar for loops which is used in our project to provide feedback on the progress in the sqa system. | https://pypi.org/project/tqdm/ |
sparqlwrapper | pip install sparqlwrapper |
A wrapper for SparQL Endpoint which provides multiple fields and functions when using SparQL queries. As we are working with RDF graphs that are queried with SparQL, we use this library to support this access. | https://pypi.org/project/sparqlwrapper/ |
pybtex | pip install pybtex |
A library for managing BibTeX files. We use it for loading in data from BibTeX files. | https://pypi.org/project/pybtex/ |
language_tool_python | pip install language_tool_python |
An easy to use library for grammar and spell checking. It is applied in the qa generation process. | https://pypi.org/project/language_tool_python/ |
ipywidgets | pip install ipywidgets |
A library for the creation of interactive widgets in jupyter. It is a requirement for displaying certain output of other libraries. | https://pypi.org/project/ipywidgets/ |
pylatexenc | pip install pylatexenc |
A library for parsing and converting LaTeX content which we use for formatting purpose as some of our importing data is stored in LaTeX format. | https://pypi.org/project/pylatexenc/ |
sacrebleu | pip install sacrebleu |
A library for the calculation of the BLEU score. | https://pypi.org/project/sacrebleu/ |
rouge_score | pip install rouge_score |
A library for the calculation of the ROUGE score. | https://pypi.org/project/rouge_score/ |
lightrag-hku | pip install lightrag-hku |
An implementation of the LightRAG retriever which we are using as a Baseline. | https://pypi.org/project/lightrag-hku/ |
bitsandbytes | pip install bitsandbytes |
A library that provides optimizers and quantization tools for training large neural networks. We use it for loading local huggingface models. | https://pypi.org/project/bitsandbytes/ |
accelerate | pip install accelerate |
A library that simplifies distributed training and model acceleration which we are using for local huggingface models. | https://pypi.org/project/accelerate/ |
evaluate | pip install evaluate |
A library from huggingface for easily loading standardized metric functions. We use it to calculate the BERT scores. | https://pypi.org/project/evaluate/ |
bert_score | pip install bert_score |
Used for calculating BERTScore, a metric for evaluating text generation. This is a requirement for the Bert-Score-Evaluator. | https://pypi.org/project/bert_score/ |
sentencepiece | pip install sentencepiece |
An unsupervised text tokenizer and detokenizer used for subword segmentation. This library is a requirement for loading huggingface models. | https://pypi.org/project/sentencepiece/ |
aioboto3 | pip install aioboto3 |
A library that provides an asynchronous interface to AWS services. It is an installation requirement for the lightrag-hku library. Still we are not using AWS services. | https://pypi.org/project/aioboto3/ |
nano_vectordb | pip install nano_vectordb |
An implementation of a lightweight vector database which is used in the LightRAG retriever. As such it is a requirement for the lightrag-hku library. | https://pypi.org/project/nano_vectordb/ |
pipmaster | pip install pipmaster |
A utility that simplifies Python package management and dependency updates. It is a requirement for the lightrag-hku library. | https://pypi.org/project/pipmaster/ |
rank-bm25 | pip install rank-bm25 |
Provides an implementation of the BM25 ranking algorithm. This is a requirement for Think-on-Graph. | https://pypi.org/project/rank-bm25/ |
chromadb-ops | pip install chromadb-ops |
A tool for HNSW (Hierarchical Navigable Small World) index maintenance in ChromaDB. | https://pypi.org/project/chromadb-ops/ |
graphrag | pip install graphrag |
A library for building graph-based retrieval augmented generation systems. This is an optional dependency. | https://pypi.org/project/graphrag/ |
codecarbon | pip install codecarbon |
A library for tracking the carbon emissions, energy usage and more. We use it to track the sustainability of the retrievers. This is an optional dependency. | https://pypi.org/project/codecarbon/ |