README‐‐'Xiangyu' Branch - southern-cross-ai/Deepthink GitHub Wiki

Ollama + Gradio All-in-One Container--28/03/2025

A Docker-based project that runs both the Ollama LLM backend and a Gradio frontend UI in a single container. Designed for easy local development and testing of LLMs with web access.


🔍 Project Overview

Goal

  • Build a custom Docker image that launches:
    • Ollama model server (LLM inference backend)
    • Gradio-based web interface (chat frontend)
  • Enable real-time local interaction with LLMs without relying on external APIs
  • Streamline development workflow for prototyping AI apps

📁 Project Structure

ollama_gradio_container/
├── Dockerfile         # Image build script
├── start.sh           # Shell script to start Ollama & Gradio
├── gradio_app.py      # Gradio frontend with Ollama API integration
├── requirements.txt   # Python dependencies

🧠 Design Principles

Principle Description-
Single container Everything (LLM + frontend) runs in one image
REST API integration Gradio connects to Ollama via HTTP requests
Robust startup Uses curl to wait for Ollama to become ready

📃 File Breakdown

1. Dockerfile

Builds a Ubuntu-based image with Python and Ollama installed.

FROM ubuntu:22.04

RUN apt-get update && apt-get install -y \
    curl \
    sudo \
    python3 \
    python3-pip \
    git \
    wget

RUN curl -fsSL https://ollama.com/install.sh | sh

COPY requirements.txt /app/requirements.txt
RUN pip3 install -r /app/requirements.txt

WORKDIR /app
COPY gradio_app.py /app/
COPY start.sh /app/
RUN chmod +x /app/start.sh

EXPOSE 7860   # Gradio UI
EXPOSE 11434  # Ollama API

CMD ["/app/start.sh"]

2. start.sh

Launches Ollama in the background, waits until API is ready, then launches Gradio.

#!/bin/bash

ollama serve &

# Wait until Ollama API is reachable
until curl -s http://localhost:11434/api/tags > /dev/null; do
  echo "Waiting for Ollama API..."
  sleep 2
done

ollama pull llama2
ollama run llama2 &
sleep 10

python3 /app/gradio_app.py

3. gradio_app.py

Defines the Gradio UI and integrates with the Ollama REST API.

import gradio as gr
import requests

def chat_with_ollama(prompt, history):
    if not isinstance(history, list):
        history = []

    messages = []
    for user_msg, bot_msg in history:
        messages.append({"role": "user", "content": user_msg})
        messages.append({"role": "assistant", "content": bot_msg})
    messages.append({"role": "user", "content": prompt})

    response = requests.post(
        "http://localhost:11434/api/chat",
        json={"model": "llama2", "messages": messages, "stream": False}
    )

    try:
        reply = response.json()["message"]["content"]
    except Exception as e:
        reply = f"[Error] {e}\nRaw response:\n{response.text}"

    history.append((prompt, reply))
    return history, history

iface = gr.Interface(
    fn=chat_with_ollama,
    inputs=[gr.Textbox(label="Your message"), gr.State([])],
    outputs=[gr.Chatbot(), gr.State([])],
    title="Ollama + Gradio in One Container"
)

iface.launch(server_name="0.0.0.0", server_port=7860)

4. requirements.txt

gradio
requests

Python packages used by the Gradio frontend and API client.


🌐 Usage Instructions

Build the image

docker build -t ollama-gradio-app .

Run the container (with GPU & volume)

docker run --gpus all \
  -p 7860:7860 \
  -p 11434:11434 \
  -v ~/.ollama:/root/.ollama \
  ollama-gradio-app

Remove --gpus all if you're not using GPU


🔗 Architecture Flow

[User] --> Gradio (gradio_app.py) --> [POST /api/chat] --> Ollama --> LLM Response

🚀 Troubleshooting

Step Problem Solution
Build Slow / failed downloads Retry or change mirror
Run Port already in use Free port or change mapping
Chat 404 or KeyError: 'message' Ollama not ready yet; add wait/check

🛠️ Next Steps

  • Add streaming support for real-time token output
  • Allow file upload (PDF summarization, etc.)
  • Integrate LangChain, FastAPI, or llama-index
  • Use docker-compose to manage future services (e.g. DB + Ollama + UI)

Ollama + Gradio + LangChain (Updated Integration Guide)--11/04/2025

This guide documents the updated setup for running an all-in-one Docker container with Ollama (LLM backend), Gradio (frontend), and enabling interaction from the host machine via LangChain.


🔍 Project Overview

This project sets up a single Docker container that runs:

  • Ollama model server (LLM backend)
  • Gradio web interface (chat frontend)

And now additionally supports:

  • LangChain on the host machine calling the containerized Ollama via HTTP API.

📅 Updated Goals

  • Build a self-contained Docker image running both frontend and backend.
  • Enable host-based Python scripts (e.g., using LangChain) to interact with Ollama.
  • Ensure proper port binding and API accessibility.

📁 Project Structure

ollama_gradio_container/
├── Dockerfile         # Builds the base image
├── start.sh           # Starts Ollama and Gradio
├── gradio_app.py      # Gradio interface with chat UI
├── requirements.txt   # Python dependencies
├── docker-compose.yml # Orchestration and networking config

🧪 Dockerfile Highlights

FROM ubuntu:22.04

RUN apt-get update && apt-get install -y \
    curl sudo python3 python3-pip git wget

RUN curl -fsSL https://ollama.com/install.sh | sh

COPY requirements.txt /app/requirements.txt
RUN pip3 install -r /app/requirements.txt

WORKDIR /app
COPY gradio_app.py /app/
COPY start.sh /app/
RUN chmod +x /app/start.sh

EXPOSE 7860   # Gradio
EXPOSE 11434  # Ollama

CMD ["/app/start.sh"]

🔧 start.sh (Updated)

#!/bin/bash

# Enable Ollama to listen on all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve &

# Wait for the Ollama API to become ready
until curl -s http://localhost:11434/api/tags > /dev/null; do
  echo "Waiting for Ollama API..."
  sleep 2
done

ollama pull llama2
ollama run llama2 &

sleep 10
python3 /app/gradio_app.py

🤝 docker-compose.yml (Updated)

version: '3.8'
services:
  ollama-gradio-app:
    build: .
    container_name: ollama-gradio
    ports:
      - "7860:7860"
      - "11434:11434"
    volumes:
      - ~/.ollama:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
    stdin_open: true
    tty: true

🌎 Access

You can verify API accessibility with:

curl http://localhost:11434/api/tags

📄 LangChain Integration from Host

Make sure you install the latest LangChain Ollama integration on your host:

pip install -U langchain langchain-ollama

Example Python Code:

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama2", base_url="http://localhost:11434")
response = llm.invoke("你好,请介绍一下你是谁")
print(response.content)

If everything is set up correctly, this script will communicate with the LLM model running in the Docker container.


🎓 Technical Insight: Listening vs Connection

  • Port Mapping (11434:11434) bridges host to container.
  • Listening on 0.0.0.0 allows the container to accept requests from host.
  • 127.0.0.1 only listens inside container, host won't reach it even with port mapping.

Always make sure Ollama is set to:

OLLAMA_HOST=0.0.0.0

To allow outside access.


🚀 What’s Next

  • Add streaming token support
  • Integrate LangChain chains/tools in the container
  • Add file upload (PDFs, etc.)
  • Use docker-compose with more services (e.g. DB + LLM + frontend)

📄 Reference Commands

# Build and run container with updated config
docker compose up --build

# From host: test API is working
curl http://localhost:11434/api/tags

# From host: run LangChain client script
python3 langchain_client.py

📦 Ollama + LangChain + Gradio: Unified Containerized LLM App (Post-Sprint 2)--02/05/2025

This README reflects the complete state of the project as of the end of Sprint 2 (post-April 11), with full LangChain + Gradio + Ollama integration inside a single container. It summarizes all updates, resolved issues, and next-stage planning.


✅ Project Milestone Summary

Component Status Notes
Ollama (LLM) ✅ Running in container on port 11434
Gradio Frontend ✅ Running in container on port 7860
LangChain Logic ✅ Integrated into container as FastAPI (port 8000)
Full Routing ✅ Gradio → LangChain → Ollama completed

🧩 File Overview (as of Sprint 2)

ollama_gradio_container/
├── Dockerfile
├── start.sh                 # Sequential startup of Ollama, LangChain API, Gradio
├── gradio_app.py            # Refactored to call LangChain API
├── langchain_api.py         # FastAPI LangChain server exposing /chat
├── requirements.txt         # Updated Python dependencies
├── docker-compose.yml       # Ports 7860, 8000, 11434 exposed

🚀 Key Updates Since April 11

✅ 1. LangChain API Containerized

  • New file langchain_api.py launches FastAPI at port 8000.
  • Implements /chat endpoint using LangChain → Ollama.
  • Replaces host-based LangChain scripts.

✅ 2. Gradio Refactored

  • gradio_app.py now sends user input to http://localhost:8000/chat instead of Ollama API.
  • Simplified prompt structure; stateful history optional.
  • Errors handled gracefully with response fallback.

✅ 3. Dockerfile Updated

COPY langchain_api.py /app/

Also includes fastapi, uvicorn, and LangChain libraries.

✅ 4. start.sh Sequencing

OLLAMA_HOST=0.0.0.0 ollama serve &
# Wait for Ollama to be ready
ollama run llama2 &
python3 /app/langchain_api.py &
python3 /app/gradio_app.py

Ensures API is started before Gradio UI.

✅ 5. docker-compose Enhancements

  • New exposed port: 8000:8000
  • Added image: ollama-gradio-5.1
  • GPU compatibility reserved
  • Volume mounted: ~/.ollama:/root/.ollama

💡 Resolved Issues

Issue Summary
#36 LangChain API copied into Dockerfile, served inside container
#38 Gradio switched to LangChain API endpoint instead of Ollama
curl 52 error Resolved by ensuring API is run & port mapped
Missing file langchain_api.py missing → added COPY in Dockerfile
Service race start.sh sequencing corrected

🌐 Access Endpoints

Service URL
Gradio UI http://localhost:7860
LangChain http://localhost:8000/chat
Ollama API http://localhost:11434/api/tags

🧪 Test Example (Post-integration)

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"topic": "hi"}'

Expected response:

{ "response": "你好!很高兴认识你..." }

📌 Current Flow Diagram

[ Gradio UI ] (7860)
      ↓
[ LangChain API ] (8000)
      ↓
[ Ollama Model ] (11434)

📌 Sprint 3 Planning (Next Steps)

  • Add support for chat history (multi-turn)
  • Add logging and usage stats for API
  • Move LangChain logic to Chains/Tools
  • Add PDF/document upload + embeddings (Chroma or FAISS)
  • CI/CD auto-deploy (optional)
  • Better error handling (timeouts, retry, etc.)

📎 Commands

docker compose up --build        # Rebuild container with all components
curl http://localhost:8000/chat  # Call LangChain API

This completes the Sprint 2 deliverable: a unified, modular, fully functional LLM container with Gradio frontend, LangChain orchestration, and Ollama inference backend. ✅


📂 Key Files with Code and Explanations

📄 1. langchain_api.py (FastAPI server for LangChain)

from fastapi import FastAPI
from pydantic import BaseModel
from langchain_ollama import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

app = FastAPI()

class ChatRequest(BaseModel):
    topic: str

prompt = PromptTemplate.from_template("{topic}")
model = ChatOllama(model="llama2", base_url="http://localhost:11434", streaming=False)
parser = StrOutputParser()

@app.post("/chat")
async def chat(req: ChatRequest):
    formatted = prompt.format(topic=req.topic)
    result = model.invoke(formatted)
    return {"response": parser.invoke(result)}

Explanation:

  • Exposes POST /chat endpoint.
  • Uses LangChain to format input, call Ollama, and parse output.
  • Acts as an internal API between Gradio and model backend.

📄 2. gradio_app.py (Frontend UI calling LangChain API)

import gradio as gr
import requests

def chat_with_langchain(prompt, history):
    response = requests.post(
        "http://localhost:8000/chat",
        json={"topic": prompt}
    )
    reply = response.json().get("response", "[Error]")
    history.append((prompt, reply))
    return history, history

iface = gr.Interface(
    fn=chat_with_langchain,
    inputs=[gr.Textbox(label="Your message"), gr.State([])],
    outputs=[gr.Chatbot(), gr.State([])],
    title="Gradio → LangChain → Ollama"
)

iface.launch(server_name="0.0.0.0", server_port=7860)

Explanation:

  • UI accepts user input and calls LangChain API.
  • Displays the chatbot response via parsed JSON.

📄 3. Dockerfile (Container build instructions)

FROM python:3.10-slim

WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt

COPY gradio_app.py /app/
COPY langchain_api.py /app/
COPY start.sh /app/
RUN chmod +x /app/start.sh

CMD ["/app/start.sh"]

Explanation:

  • Installs LangChain, Gradio, FastAPI.
  • Copies all Python scripts.
  • Runs the unified startup script.

📄 4. start.sh (Service orchestration)

OLLAMA_HOST=0.0.0.0 ollama serve &

until curl -s http://localhost:11434/api/tags > /dev/null; do
  echo "Waiting for Ollama..."
  sleep 2
done

ollama run llama2 &
python3 /app/langchain_api.py &
sleep 5
python3 /app/gradio_app.py

Explanation:

  • Starts Ollama server, waits for readiness.
  • Launches LangChain API → Then Gradio.
  • Ensures services start in correct order.

📄 5. docker-compose.yml (Multi-port orchestration)

services:
  ollama-gradio-app:
    build: .
    image: ollama-gradio:latest
    container_name: deepthink-instance
    ports:
      - "7860:7860"
      - "11434:11434"
      - "8000:8000"
    volumes:
      - ~/.ollama:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
    tty: true
    stdin_open: true

Explanation:

  • Exposes all required services: Gradio UI, LangChain API, Ollama backend.
  • Mounts local model cache for reuse.