README‐‐'Xiangyu' Branch - southern-cross-ai/Deepthink GitHub Wiki

Ollama + Gradio All-in-One Container--28/03/2025

A Docker-based project that runs both the Ollama LLM backend and a Gradio frontend UI in a single container. Designed for easy local development and testing of LLMs with web access.

🔍 Project Overview

Goal

Build a custom Docker image that launches:
- Ollama model server (LLM inference backend)
- Gradio-based web interface (chat frontend)
Enable real-time local interaction with LLMs without relying on external APIs
Streamline development workflow for prototyping AI apps

📁 Project Structure

ollama_gradio_container/
├── Dockerfile         # Image build script
├── start.sh           # Shell script to start Ollama & Gradio
├── gradio_app.py      # Gradio frontend with Ollama API integration
├── requirements.txt   # Python dependencies

🧠 Design Principles

Principle	Description-
Single container	Everything (LLM + frontend) runs in one image
REST API integration	Gradio connects to Ollama via HTTP requests
Robust startup	Uses `curl` to wait for Ollama to become ready

📃 File Breakdown

1. `Dockerfile`

Builds a Ubuntu-based image with Python and Ollama installed.

FROM ubuntu:22.04

RUN apt-get update && apt-get install -y \
    curl \
    sudo \
    python3 \
    python3-pip \
    git \
    wget

RUN curl -fsSL https://ollama.com/install.sh | sh

COPY requirements.txt /app/requirements.txt
RUN pip3 install -r /app/requirements.txt

WORKDIR /app
COPY gradio_app.py /app/
COPY start.sh /app/
RUN chmod +x /app/start.sh

EXPOSE 7860   # Gradio UI
EXPOSE 11434  # Ollama API

CMD ["/app/start.sh"]

2. `start.sh`

Launches Ollama in the background, waits until API is ready, then launches Gradio.

#!/bin/bash

ollama serve &

# Wait until Ollama API is reachable
until curl -s http://localhost:11434/api/tags > /dev/null; do
  echo "Waiting for Ollama API..."
  sleep 2
done

ollama pull llama2
ollama run llama2 &
sleep 10

python3 /app/gradio_app.py

3. `gradio_app.py`

Defines the Gradio UI and integrates with the Ollama REST API.

import gradio as gr
import requests

def chat_with_ollama(prompt, history):
    if not isinstance(history, list):
        history = []

    messages = []
    for user_msg, bot_msg in history:
        messages.append({"role": "user", "content": user_msg})
        messages.append({"role": "assistant", "content": bot_msg})
    messages.append({"role": "user", "content": prompt})

    response = requests.post(
        "http://localhost:11434/api/chat",
        json={"model": "llama2", "messages": messages, "stream": False}
    )

    try:
        reply = response.json()["message"]["content"]
    except Exception as e:
        reply = f"[Error] {e}\nRaw response:\n{response.text}"

    history.append((prompt, reply))
    return history, history

iface = gr.Interface(
    fn=chat_with_ollama,
    inputs=[gr.Textbox(label="Your message"), gr.State([])],
    outputs=[gr.Chatbot(), gr.State([])],
    title="Ollama + Gradio in One Container"
)

iface.launch(server_name="0.0.0.0", server_port=7860)

4. `requirements.txt`

gradio
requests

Python packages used by the Gradio frontend and API client.

🌐 Usage Instructions

Build the image

docker build -t ollama-gradio-app .

Run the container (with GPU & volume)

docker run --gpus all \
  -p 7860:7860 \
  -p 11434:11434 \
  -v ~/.ollama:/root/.ollama \
  ollama-gradio-app

Remove --gpus all if you're not using GPU

🔗 Architecture Flow

[User] --> Gradio (gradio_app.py) --> [POST /api/chat] --> Ollama --> LLM Response

🚀 Troubleshooting

Step	Problem	Solution
Build	Slow / failed downloads	Retry or change mirror
Run	Port already in use	Free port or change mapping
Chat	404 or `KeyError: 'message'`	Ollama not ready yet; add wait/check

🛠️ Next Steps

Add streaming support for real-time token output
Allow file upload (PDF summarization, etc.)
Integrate LangChain, FastAPI, or llama-index
Use docker-compose to manage future services (e.g. DB + Ollama + UI)

Ollama + Gradio + LangChain (Updated Integration Guide)--11/04/2025

This guide documents the updated setup for running an all-in-one Docker container with Ollama (LLM backend), Gradio (frontend), and enabling interaction from the host machine via LangChain.

🔍 Project Overview

This project sets up a single Docker container that runs:

Ollama model server (LLM backend)
Gradio web interface (chat frontend)

And now additionally supports:

LangChain on the host machine calling the containerized Ollama via HTTP API.

📅 Updated Goals

Build a self-contained Docker image running both frontend and backend.
Enable host-based Python scripts (e.g., using LangChain) to interact with Ollama.
Ensure proper port binding and API accessibility.

📁 Project Structure

ollama_gradio_container/
├── Dockerfile         # Builds the base image
├── start.sh           # Starts Ollama and Gradio
├── gradio_app.py      # Gradio interface with chat UI
├── requirements.txt   # Python dependencies
├── docker-compose.yml # Orchestration and networking config

🧪 Dockerfile Highlights

FROM ubuntu:22.04

RUN apt-get update && apt-get install -y \
    curl sudo python3 python3-pip git wget

RUN curl -fsSL https://ollama.com/install.sh | sh

COPY requirements.txt /app/requirements.txt
RUN pip3 install -r /app/requirements.txt

WORKDIR /app
COPY gradio_app.py /app/
COPY start.sh /app/
RUN chmod +x /app/start.sh

EXPOSE 7860   # Gradio
EXPOSE 11434  # Ollama

CMD ["/app/start.sh"]

🔧 start.sh (Updated)

#!/bin/bash

# Enable Ollama to listen on all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve &

# Wait for the Ollama API to become ready
until curl -s http://localhost:11434/api/tags > /dev/null; do
  echo "Waiting for Ollama API..."
  sleep 2
done

ollama pull llama2
ollama run llama2 &

sleep 10
python3 /app/gradio_app.py

🤝 docker-compose.yml (Updated)

version: '3.8'
services:
  ollama-gradio-app:
    build: .
    container_name: ollama-gradio
    ports:
      - "7860:7860"
      - "11434:11434"
    volumes:
      - ~/.ollama:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
    stdin_open: true
    tty: true

🌎 Access

Gradio UI: http://localhost:7860
Ollama API (from host): http://localhost:11434/api/tags

You can verify API accessibility with:

curl http://localhost:11434/api/tags

📄 LangChain Integration from Host

Make sure you install the latest LangChain Ollama integration on your host:

pip install -U langchain langchain-ollama

Example Python Code:

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama2", base_url="http://localhost:11434")
response = llm.invoke("你好，请介绍一下你是谁")
print(response.content)

If everything is set up correctly, this script will communicate with the LLM model running in the Docker container.

🎓 Technical Insight: Listening vs Connection

Port Mapping (11434:11434) bridges host to container.
Listening on 0.0.0.0 allows the container to accept requests from host.
127.0.0.1 only listens inside container, host won't reach it even with port mapping.

Always make sure Ollama is set to:

OLLAMA_HOST=0.0.0.0

To allow outside access.

🚀 What’s Next

Add streaming token support
Integrate LangChain chains/tools in the container
Add file upload (PDFs, etc.)
Use docker-compose with more services (e.g. DB + LLM + frontend)

📄 Reference Commands

# Build and run container with updated config
docker compose up --build

# From host: test API is working
curl http://localhost:11434/api/tags

# From host: run LangChain client script
python3 langchain_client.py

📦 Ollama + LangChain + Gradio: Unified Containerized LLM App (Post-Sprint 2)--02/05/2025

This README reflects the complete state of the project as of the end of Sprint 2 (post-April 11), with full LangChain + Gradio + Ollama integration inside a single container. It summarizes all updates, resolved issues, and next-stage planning.

✅ Project Milestone Summary

Component	Status	Notes
Ollama (LLM)	✅ Running in container on port 11434
Gradio Frontend	✅ Running in container on port 7860
LangChain Logic	✅ Integrated into container as FastAPI (port 8000)
Full Routing	✅ Gradio → LangChain → Ollama completed

🧩 File Overview (as of Sprint 2)

ollama_gradio_container/
├── Dockerfile
├── start.sh                 # Sequential startup of Ollama, LangChain API, Gradio
├── gradio_app.py            # Refactored to call LangChain API
├── langchain_api.py         # FastAPI LangChain server exposing /chat
├── requirements.txt         # Updated Python dependencies
├── docker-compose.yml       # Ports 7860, 8000, 11434 exposed

🚀 Key Updates Since April 11

✅ 1. LangChain API Containerized

New file langchain_api.py launches FastAPI at port 8000.
Implements /chat endpoint using LangChain → Ollama.
Replaces host-based LangChain scripts.

✅ 2. Gradio Refactored

gradio_app.py now sends user input to http://localhost:8000/chat instead of Ollama API.
Simplified prompt structure; stateful history optional.
Errors handled gracefully with response fallback.

✅ 3. Dockerfile Updated

COPY langchain_api.py /app/

Also includes fastapi, uvicorn, and LangChain libraries.

✅ 4. start.sh Sequencing

OLLAMA_HOST=0.0.0.0 ollama serve &
# Wait for Ollama to be ready
ollama run llama2 &
python3 /app/langchain_api.py &
python3 /app/gradio_app.py

Ensures API is started before Gradio UI.

✅ 5. docker-compose Enhancements

New exposed port: 8000:8000
Added image: ollama-gradio-5.1
GPU compatibility reserved
Volume mounted: ~/.ollama:/root/.ollama

💡 Resolved Issues

Issue	Summary
#36	LangChain API copied into Dockerfile, served inside container
#38	Gradio switched to LangChain API endpoint instead of Ollama
curl 52 error	Resolved by ensuring API is run & port mapped
Missing file	`langchain_api.py` missing → added COPY in Dockerfile
Service race	`start.sh` sequencing corrected

🌐 Access Endpoints

Service	URL
Gradio UI	http://localhost:7860
LangChain	http://localhost:8000/chat
Ollama API	http://localhost:11434/api/tags

🧪 Test Example (Post-integration)

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"topic": "hi"}'

Expected response:

{ "response": "你好！很高兴认识你..." }

📌 Current Flow Diagram

[ Gradio UI ] (7860)
      ↓
[ LangChain API ] (8000)
      ↓
[ Ollama Model ] (11434)

📌 Sprint 3 Planning (Next Steps)

Add support for chat history (multi-turn)
Add logging and usage stats for API
Move LangChain logic to Chains/Tools
Add PDF/document upload + embeddings (Chroma or FAISS)
CI/CD auto-deploy (optional)
Better error handling (timeouts, retry, etc.)

📎 Commands

docker compose up --build        # Rebuild container with all components
curl http://localhost:8000/chat  # Call LangChain API

This completes the Sprint 2 deliverable: a unified, modular, fully functional LLM container with Gradio frontend, LangChain orchestration, and Ollama inference backend. ✅

📂 Key Files with Code and Explanations

📄 1. langchain_api.py (FastAPI server for LangChain)

from fastapi import FastAPI
from pydantic import BaseModel
from langchain_ollama import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

app = FastAPI()

class ChatRequest(BaseModel):
    topic: str

prompt = PromptTemplate.from_template("{topic}")
model = ChatOllama(model="llama2", base_url="http://localhost:11434", streaming=False)
parser = StrOutputParser()

@app.post("/chat")
async def chat(req: ChatRequest):
    formatted = prompt.format(topic=req.topic)
    result = model.invoke(formatted)
    return {"response": parser.invoke(result)}

✅ Explanation:

Exposes POST /chat endpoint.
Uses LangChain to format input, call Ollama, and parse output.
Acts as an internal API between Gradio and model backend.

📄 2. gradio_app.py (Frontend UI calling LangChain API)

import gradio as gr
import requests

def chat_with_langchain(prompt, history):
    response = requests.post(
        "http://localhost:8000/chat",
        json={"topic": prompt}
    )
    reply = response.json().get("response", "[Error]")
    history.append((prompt, reply))
    return history, history

iface = gr.Interface(
    fn=chat_with_langchain,
    inputs=[gr.Textbox(label="Your message"), gr.State([])],
    outputs=[gr.Chatbot(), gr.State([])],
    title="Gradio → LangChain → Ollama"
)

iface.launch(server_name="0.0.0.0", server_port=7860)

✅ Explanation:

UI accepts user input and calls LangChain API.
Displays the chatbot response via parsed JSON.

📄 3. Dockerfile (Container build instructions)

FROM python:3.10-slim

WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt

COPY gradio_app.py /app/
COPY langchain_api.py /app/
COPY start.sh /app/
RUN chmod +x /app/start.sh

CMD ["/app/start.sh"]

✅ Explanation:

Installs LangChain, Gradio, FastAPI.
Copies all Python scripts.
Runs the unified startup script.

📄 4. start.sh (Service orchestration)

OLLAMA_HOST=0.0.0.0 ollama serve &

until curl -s http://localhost:11434/api/tags > /dev/null; do
  echo "Waiting for Ollama..."
  sleep 2
done

ollama run llama2 &
python3 /app/langchain_api.py &
sleep 5
python3 /app/gradio_app.py

✅ Explanation:

Starts Ollama server, waits for readiness.
Launches LangChain API → Then Gradio.
Ensures services start in correct order.

📄 5. docker-compose.yml (Multi-port orchestration)

services:
  ollama-gradio-app:
    build: .
    image: ollama-gradio:latest
    container_name: deepthink-instance
    ports:
      - "7860:7860"
      - "11434:11434"
      - "8000:8000"
    volumes:
      - ~/.ollama:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
    tty: true
    stdin_open: true

✅ Explanation:

Exposes all required services: Gradio UI, LangChain API, Ollama backend.
Mounts local model cache for reuse.

README‐‐'Xiangyu' Branch - southern-cross-ai/Deepthink GitHub Wiki

Ollama + Gradio All-in-One Container--28/03/2025

🔍 Project Overview

Goal

📁 Project Structure

🧠 Design Principles

📃 File Breakdown

1. Dockerfile

2. start.sh

3. gradio_app.py

4. requirements.txt

🌐 Usage Instructions

Build the image

Run the container (with GPU & volume)

🔗 Architecture Flow

🚀 Troubleshooting

🛠️ Next Steps

Ollama + Gradio + LangChain (Updated Integration Guide)--11/04/2025

🔍 Project Overview

📅 Updated Goals

📁 Project Structure

🧪 Dockerfile Highlights

🔧 start.sh (Updated)

🤝 docker-compose.yml (Updated)

🌎 Access

📄 LangChain Integration from Host

Example Python Code:

🎓 Technical Insight: Listening vs Connection

🚀 What’s Next

📄 Reference Commands

📦 Ollama + LangChain + Gradio: Unified Containerized LLM App (Post-Sprint 2)--02/05/2025

✅ Project Milestone Summary

🧩 File Overview (as of Sprint 2)

🚀 Key Updates Since April 11

✅ 1. LangChain API Containerized

✅ 2. Gradio Refactored

✅ 3. Dockerfile Updated

✅ 4. start.sh Sequencing

✅ 5. docker-compose Enhancements

💡 Resolved Issues

🌐 Access Endpoints

🧪 Test Example (Post-integration)

📌 Current Flow Diagram

📌 Sprint 3 Planning (Next Steps)

📎 Commands

📂 Key Files with Code and Explanations

📄 1. langchain_api.py (FastAPI server for LangChain)

📄 2. gradio_app.py (Frontend UI calling LangChain API)

📄 3. Dockerfile (Container build instructions)

📄 4. start.sh (Service orchestration)

📄 5. docker-compose.yml (Multi-port orchestration)

1. `Dockerfile`

2. `start.sh`

3. `gradio_app.py`

4. `requirements.txt`