InferX Knowledge Base Toolkit How to - inferx-net/inferx GitHub Wiki
Overview
This toolkit converts source material into Markdown that is easier to use for retrieval, summarization, and prompt construction.
It provides a containerized document conversion workflow using inferx/knowledgebase:v0.10.
Main Use Cases
Document Conversion
Use the container when you want to process files from /input and generate Markdown artifacts in /output.
Supported input types:
- DOCX
- PPTX
- HTML / HTM
- Markdown
- Text
The container walks /input recursively and processes all supported files it finds.
Typical Workflows
1. Convert Local Documents
sudo docker run --rm \
-v /home/brad/test/input:/input \
-v /home/brad/test/output:/output \
-e "API_KEY=YOUR_API_KEY_HERE" \
inferx/knowledgebase:v0.10 \
base_url=https://model.inferx.net/funccall/tn-a3t79iogb2/endpoints/Qwen3-Coder-Next-FP8/v1 \
api_key=YOUR_API_KEY_HERE \
model=Qwen/Qwen3-Coder-Next-FP8
Output Artifacts
Conversion Output
merged.md: original output with boundaries and summariesoptimized.md: compressed output for KV-cache-oriented usagellm.md: prompt-ready version with instructions and citation guidance
LLM-Ready Format
The llm.md output is intended to be used directly in prompts. It includes:
- a system instruction block
- citation rules
- section-preserving formatting
- contextual handling for diagrams and partially rendered formulas
Expected citation form:
[bitcoin.pdf, Section 4 - Proof-of-Work][bitcoin.pdf, Section 11][bitcoin.pdf, Section 5, Step 3]
Avoid citations that omit the filename or section reference.
Configuration Reference
Container Arguments
base_url: LLM endpoint URLapi_key: API key for authenticationmodel: model identifier, defaulting toQwen/Qwen3-Coder-Next-FP8
Environment Variables
API_KEY: alternative way to provide the API key
Performance Notes
- Conversion is roughly 10 seconds per file
- Lossless compression is effectively immediate
- LLM-ready formatting is produced as part of normal output generation
Operational Notes
- Lossless compression is deterministic and safe, but token reduction is small
- OCR models are preloaded into the Docker image
When To Use What
Use this image for local document sets that need Markdown conversion and prompt-ready output.