README‐‐'Xiangyu' Branch - southern-cross-ai/Deepthink GitHub Wiki
Ollama + Gradio All-in-One Container--28/03/2025
A Docker-based project that runs both the Ollama LLM backend and a Gradio frontend UI in a single container. Designed for easy local development and testing of LLMs with web access.
🔍 Project Overview
Goal
- Build a custom Docker image that launches:
- Ollama model server (LLM inference backend)
- Gradio-based web interface (chat frontend)
- Enable real-time local interaction with LLMs without relying on external APIs
- Streamline development workflow for prototyping AI apps
📁 Project Structure
ollama_gradio_container/
├── Dockerfile # Image build script
├── start.sh # Shell script to start Ollama & Gradio
├── gradio_app.py # Gradio frontend with Ollama API integration
├── requirements.txt # Python dependencies
🧠 Design Principles
Principle | Description- |
---|---|
Single container | Everything (LLM + frontend) runs in one image |
REST API integration | Gradio connects to Ollama via HTTP requests |
Robust startup | Uses curl to wait for Ollama to become ready |
📃 File Breakdown
Dockerfile
1. Builds a Ubuntu-based image with Python and Ollama installed.
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
curl \
sudo \
python3 \
python3-pip \
git \
wget
RUN curl -fsSL https://ollama.com/install.sh | sh
COPY requirements.txt /app/requirements.txt
RUN pip3 install -r /app/requirements.txt
WORKDIR /app
COPY gradio_app.py /app/
COPY start.sh /app/
RUN chmod +x /app/start.sh
EXPOSE 7860 # Gradio UI
EXPOSE 11434 # Ollama API
CMD ["/app/start.sh"]
start.sh
2. Launches Ollama in the background, waits until API is ready, then launches Gradio.
#!/bin/bash
ollama serve &
# Wait until Ollama API is reachable
until curl -s http://localhost:11434/api/tags > /dev/null; do
echo "Waiting for Ollama API..."
sleep 2
done
ollama pull llama2
ollama run llama2 &
sleep 10
python3 /app/gradio_app.py
gradio_app.py
3. Defines the Gradio UI and integrates with the Ollama REST API.
import gradio as gr
import requests
def chat_with_ollama(prompt, history):
if not isinstance(history, list):
history = []
messages = []
for user_msg, bot_msg in history:
messages.append({"role": "user", "content": user_msg})
messages.append({"role": "assistant", "content": bot_msg})
messages.append({"role": "user", "content": prompt})
response = requests.post(
"http://localhost:11434/api/chat",
json={"model": "llama2", "messages": messages, "stream": False}
)
try:
reply = response.json()["message"]["content"]
except Exception as e:
reply = f"[Error] {e}\nRaw response:\n{response.text}"
history.append((prompt, reply))
return history, history
iface = gr.Interface(
fn=chat_with_ollama,
inputs=[gr.Textbox(label="Your message"), gr.State([])],
outputs=[gr.Chatbot(), gr.State([])],
title="Ollama + Gradio in One Container"
)
iface.launch(server_name="0.0.0.0", server_port=7860)
requirements.txt
4. gradio
requests
Python packages used by the Gradio frontend and API client.
🌐 Usage Instructions
Build the image
docker build -t ollama-gradio-app .
Run the container (with GPU & volume)
docker run --gpus all \
-p 7860:7860 \
-p 11434:11434 \
-v ~/.ollama:/root/.ollama \
ollama-gradio-app
Remove
--gpus all
if you're not using GPU
🔗 Architecture Flow
[User] --> Gradio (gradio_app.py) --> [POST /api/chat] --> Ollama --> LLM Response
🚀 Troubleshooting
Step | Problem | Solution |
---|---|---|
Build | Slow / failed downloads | Retry or change mirror |
Run | Port already in use | Free port or change mapping |
Chat | 404 or KeyError: 'message' |
Ollama not ready yet; add wait/check |
🛠️ Next Steps
- Add streaming support for real-time token output
- Allow file upload (PDF summarization, etc.)
- Integrate LangChain, FastAPI, or llama-index
- Use docker-compose to manage future services (e.g. DB + Ollama + UI)
Ollama + Gradio + LangChain (Updated Integration Guide)--11/04/2025
This guide documents the updated setup for running an all-in-one Docker container with Ollama (LLM backend), Gradio (frontend), and enabling interaction from the host machine via LangChain.
🔍 Project Overview
This project sets up a single Docker container that runs:
- Ollama model server (LLM backend)
- Gradio web interface (chat frontend)
And now additionally supports:
- LangChain on the host machine calling the containerized Ollama via HTTP API.
📅 Updated Goals
- Build a self-contained Docker image running both frontend and backend.
- Enable host-based Python scripts (e.g., using LangChain) to interact with Ollama.
- Ensure proper port binding and API accessibility.
📁 Project Structure
ollama_gradio_container/
├── Dockerfile # Builds the base image
├── start.sh # Starts Ollama and Gradio
├── gradio_app.py # Gradio interface with chat UI
├── requirements.txt # Python dependencies
├── docker-compose.yml # Orchestration and networking config
🧪 Dockerfile Highlights
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
curl sudo python3 python3-pip git wget
RUN curl -fsSL https://ollama.com/install.sh | sh
COPY requirements.txt /app/requirements.txt
RUN pip3 install -r /app/requirements.txt
WORKDIR /app
COPY gradio_app.py /app/
COPY start.sh /app/
RUN chmod +x /app/start.sh
EXPOSE 7860 # Gradio
EXPOSE 11434 # Ollama
CMD ["/app/start.sh"]
🔧 start.sh (Updated)
#!/bin/bash
# Enable Ollama to listen on all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve &
# Wait for the Ollama API to become ready
until curl -s http://localhost:11434/api/tags > /dev/null; do
echo "Waiting for Ollama API..."
sleep 2
done
ollama pull llama2
ollama run llama2 &
sleep 10
python3 /app/gradio_app.py
🤝 docker-compose.yml (Updated)
version: '3.8'
services:
ollama-gradio-app:
build: .
container_name: ollama-gradio
ports:
- "7860:7860"
- "11434:11434"
volumes:
- ~/.ollama:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0
stdin_open: true
tty: true
🌎 Access
- Gradio UI: http://localhost:7860
- Ollama API (from host): http://localhost:11434/api/tags
You can verify API accessibility with:
curl http://localhost:11434/api/tags
📄 LangChain Integration from Host
Make sure you install the latest LangChain Ollama integration on your host:
pip install -U langchain langchain-ollama
Example Python Code:
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama2", base_url="http://localhost:11434")
response = llm.invoke("你好,请介绍一下你是谁")
print(response.content)
If everything is set up correctly, this script will communicate with the LLM model running in the Docker container.
🎓 Technical Insight: Listening vs Connection
- Port Mapping (
11434:11434
) bridges host to container. - Listening on
0.0.0.0
allows the container to accept requests from host. 127.0.0.1
only listens inside container, host won't reach it even with port mapping.
Always make sure Ollama is set to:
OLLAMA_HOST=0.0.0.0
To allow outside access.
🚀 What’s Next
- Add streaming token support
- Integrate LangChain chains/tools in the container
- Add file upload (PDFs, etc.)
- Use
docker-compose
with more services (e.g. DB + LLM + frontend)
📄 Reference Commands
# Build and run container with updated config
docker compose up --build
# From host: test API is working
curl http://localhost:11434/api/tags
# From host: run LangChain client script
python3 langchain_client.py
📦 Ollama + LangChain + Gradio: Unified Containerized LLM App (Post-Sprint 2)--02/05/2025
This README reflects the complete state of the project as of the end of Sprint 2 (post-April 11), with full LangChain + Gradio + Ollama integration inside a single container. It summarizes all updates, resolved issues, and next-stage planning.
✅ Project Milestone Summary
Component | Status | Notes |
---|---|---|
Ollama (LLM) | ✅ Running in container on port 11434 | |
Gradio Frontend | ✅ Running in container on port 7860 | |
LangChain Logic | ✅ Integrated into container as FastAPI (port 8000) | |
Full Routing | ✅ Gradio → LangChain → Ollama completed |
🧩 File Overview (as of Sprint 2)
ollama_gradio_container/
├── Dockerfile
├── start.sh # Sequential startup of Ollama, LangChain API, Gradio
├── gradio_app.py # Refactored to call LangChain API
├── langchain_api.py # FastAPI LangChain server exposing /chat
├── requirements.txt # Updated Python dependencies
├── docker-compose.yml # Ports 7860, 8000, 11434 exposed
🚀 Key Updates Since April 11
✅ 1. LangChain API Containerized
- New file
langchain_api.py
launches FastAPI at port 8000. - Implements
/chat
endpoint using LangChain → Ollama. - Replaces host-based LangChain scripts.
✅ 2. Gradio Refactored
gradio_app.py
now sends user input tohttp://localhost:8000/chat
instead of Ollama API.- Simplified prompt structure; stateful history optional.
- Errors handled gracefully with response fallback.
✅ 3. Dockerfile Updated
COPY langchain_api.py /app/
Also includes fastapi
, uvicorn
, and LangChain libraries.
✅ 4. start.sh Sequencing
OLLAMA_HOST=0.0.0.0 ollama serve &
# Wait for Ollama to be ready
ollama run llama2 &
python3 /app/langchain_api.py &
python3 /app/gradio_app.py
Ensures API is started before Gradio UI.
✅ 5. docker-compose Enhancements
- New exposed port:
8000:8000
- Added
image: ollama-gradio-5.1
- GPU compatibility reserved
- Volume mounted:
~/.ollama:/root/.ollama
💡 Resolved Issues
Issue | Summary |
---|---|
#36 | LangChain API copied into Dockerfile, served inside container |
#38 | Gradio switched to LangChain API endpoint instead of Ollama |
curl 52 error | Resolved by ensuring API is run & port mapped |
Missing file | langchain_api.py missing → added COPY in Dockerfile |
Service race | start.sh sequencing corrected |
🌐 Access Endpoints
Service | URL |
---|---|
Gradio UI | http://localhost:7860 |
LangChain | http://localhost:8000/chat |
Ollama API | http://localhost:11434/api/tags |
🧪 Test Example (Post-integration)
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"topic": "hi"}'
Expected response:
{ "response": "你好!很高兴认识你..." }
📌 Current Flow Diagram
[ Gradio UI ] (7860)
↓
[ LangChain API ] (8000)
↓
[ Ollama Model ] (11434)
📌 Sprint 3 Planning (Next Steps)
- Add support for chat history (multi-turn)
- Add logging and usage stats for API
- Move LangChain logic to Chains/Tools
- Add PDF/document upload + embeddings (Chroma or FAISS)
- CI/CD auto-deploy (optional)
- Better error handling (timeouts, retry, etc.)
📎 Commands
docker compose up --build # Rebuild container with all components
curl http://localhost:8000/chat # Call LangChain API
This completes the Sprint 2 deliverable: a unified, modular, fully functional LLM container with Gradio frontend, LangChain orchestration, and Ollama inference backend. ✅
📂 Key Files with Code and Explanations
📄 1. langchain_api.py (FastAPI server for LangChain)
from fastapi import FastAPI
from pydantic import BaseModel
from langchain_ollama import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
app = FastAPI()
class ChatRequest(BaseModel):
topic: str
prompt = PromptTemplate.from_template("{topic}")
model = ChatOllama(model="llama2", base_url="http://localhost:11434", streaming=False)
parser = StrOutputParser()
@app.post("/chat")
async def chat(req: ChatRequest):
formatted = prompt.format(topic=req.topic)
result = model.invoke(formatted)
return {"response": parser.invoke(result)}
✅ Explanation:
- Exposes POST
/chat
endpoint. - Uses LangChain to format input, call Ollama, and parse output.
- Acts as an internal API between Gradio and model backend.
📄 2. gradio_app.py (Frontend UI calling LangChain API)
import gradio as gr
import requests
def chat_with_langchain(prompt, history):
response = requests.post(
"http://localhost:8000/chat",
json={"topic": prompt}
)
reply = response.json().get("response", "[Error]")
history.append((prompt, reply))
return history, history
iface = gr.Interface(
fn=chat_with_langchain,
inputs=[gr.Textbox(label="Your message"), gr.State([])],
outputs=[gr.Chatbot(), gr.State([])],
title="Gradio → LangChain → Ollama"
)
iface.launch(server_name="0.0.0.0", server_port=7860)
✅ Explanation:
- UI accepts user input and calls LangChain API.
- Displays the chatbot response via parsed JSON.
📄 3. Dockerfile (Container build instructions)
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
COPY gradio_app.py /app/
COPY langchain_api.py /app/
COPY start.sh /app/
RUN chmod +x /app/start.sh
CMD ["/app/start.sh"]
✅ Explanation:
- Installs LangChain, Gradio, FastAPI.
- Copies all Python scripts.
- Runs the unified startup script.
📄 4. start.sh (Service orchestration)
OLLAMA_HOST=0.0.0.0 ollama serve &
until curl -s http://localhost:11434/api/tags > /dev/null; do
echo "Waiting for Ollama..."
sleep 2
done
ollama run llama2 &
python3 /app/langchain_api.py &
sleep 5
python3 /app/gradio_app.py
✅ Explanation:
- Starts Ollama server, waits for readiness.
- Launches LangChain API → Then Gradio.
- Ensures services start in correct order.
📄 5. docker-compose.yml (Multi-port orchestration)
services:
ollama-gradio-app:
build: .
image: ollama-gradio:latest
container_name: deepthink-instance
ports:
- "7860:7860"
- "11434:11434"
- "8000:8000"
volumes:
- ~/.ollama:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0
tty: true
stdin_open: true
✅ Explanation:
- Exposes all required services: Gradio UI, LangChain API, Ollama backend.
- Mounts local model cache for reuse.