Developer Docs - benedikt-weyer/datahub-ai GitHub Wiki
Welcome to the Developer Docs!
Required Software
It is required to have Git, Docker, Python, an Environment Manager for Python (the documentation follows anaconda), Ollama for managing different models, and PostGIS for interfacing Datahub's provided Data
Setup and Installation
In the following you will find the walkthrough for installing and running the dh-chatbot-llm project
Cloning the Repository
- Clone the repository
$ git clone https://github.com/yunussozeri/dh-chatbot-llm - Copy the
.env.exampleto.envusing the following command:$ cp .env.example .env - Open the
.envfile and make sure the following variables are setSECRET_KEY,DATAHUB_NAME(instructions are inside the.envfile) - Run
$ docker compose up -d - Wait/check until http://localhost:8000/ shows the chat interface
After this you can start/stop the system with:
$ docker compose start
$ docker compose stop
If you change the .env file run the following command to apply the changes:
$ docker compose up -d
Now either import an existing data dump, or create a new instance. Steps for importing an existing dump is below.
If you get the error "Cannot connect to the Docker daemon at..." this means your Docker is not running, try starting Docker and running the command again.
Preparing the Data-Dump
Datahub provides ready-to-use database export for Ghana that you can use to directly see and use the system without the need to download and process the raw data on your local machine.
Go to the releases page and download the latest *.dump file and place it in the ./data/ folder.
Run the following command from the root of the repository:
$ docker compose exec datahub python manage.py restore ./data/<downloaded *.dump file>
Installing LLM Models
After the docker containers are running you can install the required models with the following commands:
$ docker compose exec ollama ollama pull mxbai-embed-large:latest
$ docker compose exec ollama ollama pull deepseek-r1:8b
$ docker compose exec ollama ollama pull gemma2:9b
Recomended: To run the LLMs on the GPU please take the steps mentioned in the official Ollama Docker Documentation and modify the docker-compose.yaml in the root directory of the project.
Installing Ollama for non docker runs (optional)
Go to Ollama Website and download for your operating system.
After the installation is complete go to the
After that you can download and install the required models with the following commands:
$ ollama ollama pull mxbai-embed-large:latest
$ ollama ollama pull deepseek-r1:8b
$ ollama ollama pull gemma2:9b
Edit the .env file to incorporate the new ollama host.
Creating a Virtual Environment
This documentation uses Anaconda Distribution as Python Environment Manager. You can use any environment manager you like
For download go to Anaconda Download Page and download for your operating system.
After the installation is complete, go to the repository location and open a terminal in the root directory.
Create a new virtual environment wit the following command : conda create -n <name of your virtual environment> python=3.9.13
After that you can activate the virtual environment with: conda activate <name of your virtual environment>,
And deactivate with : conda deactivate <name of your virtual environment>
Creating a Django-Superuser (optional)
Run the following command to create a new user with which you can log in into the backend (http://localhost:8000/admin):
$ docker compose exec datahub python manage.py createsuperuser
Composing together the rest of the required Containers
After importing the data dump you are ready to build the container that will communicate with the language model.
To do this, run the following command in the root directory: $ docker compose build
You need an active internet connection for the build.
After the process is complete you can now use the AI Chat in the Datahub interface.
Reaching the Interface
The interface is avaliable under AI Chat.

You can enter your prompt into the text box and send it to the AI and get information about the data.
Architechture and Design
We decided to seperate our components by responsibility, each component is simultaneosly their own docker container.
The RAG-Pipeline is designed as following:

Here, roles and responsibilities of each component:
- Datahub Module: Acts as the central service point for end users by prodiving a UI and offering an interaface to inspect the data manually.
- mongo-db: Contains the data about active tables and persists them locally
- Post-GIS: Database also used by Datahub to store the demographical and geographical data. Offers interface to Datahub and Datahub-AI
- Ollama: Hosts the LLM's (Large Language Models) and their embeddings.
- Datahub-AI: Main Component that is developed by us. Manages the AI and the active tables as well as offers a REST-like API to communicate with AI and handle the table descriptions.
Choosing a Large Language Model
We wanted a model that is free, smart enough to sustain our requirements but small enough so that it could run on many devices.
However, the performance of the model and interface cohesion varied drastically when we experimented with different models.
In the end we decided to go for dolphin-llama3.
Limitations
Missing Know-How about django Architechture made it sometimes hard to adapt the solutions to the surfaced problems. Although, in very few cases we were not able to find a workaround solution.
One option was to deploy the project on the kubernetes-cluster of University of Applied Sciences of Hamburg, which we are still working on.
Contact and Feedback
The Chatbot Extension to datahub-ghana is made by students of University of Applied Sciences in Hamburg
- Jan Biedasiek, [email protected]
- Michael German, [email protected]
- Yunus Sözeri, [email protected]
- Benedikt Weyer, [email protected]
You can create Pull-Requests to our Repository or send an Email to one of us for Feedback!