embedding model - chunhualiao/public-docs GitHub Wiki

benchmark datasets (e.g., MTEB)

Enter your LLM Embedding Model Choices are:

  • openai
  • azureopenai
  • Embeddings available only with OllamaEmbedding:
    • llama2
    • mxbai-embed-large
    • nomic-embed-text
    • all-minilm
    • stable-code
    • bge-m3
    • bge-large
    • paraphrase-multilingual
    • snowflake-arctic-embed
  • Leave blank to default to 'BAAI/bge-small-en-v1.5' via huggingface

DeepWiki-Open's Choices

Here’s a side-by-side comparison table for text-embedding-3-small (OpenAI) and nomic-embed-text (Ollama):

Aspect text-embedding-3-small (OpenAI) nomic-embed-text (Ollama)
Provider OpenAI Nomic AI / Ollama
License Proprietary (OpenAI) Open-source (Apache 2.0 or similar)
Runs Locally ❌ No (cloud API only) βœ… Yes (via Ollama)
Requires Internet/API Key βœ… Yes ❌ No
Model Size / Latency Small model, low latency on OpenAI infra Small to medium (384–768 dim), fast on CPU/GPU
Embedding Dimension 1536 Typically 384 or 768
Languages Supported Multilingual (~100+) Multilingual (fewer languages tested)
Typical Use Cases Semantic search, RAG, Q&A, classification Local RAG, semantic search, private Q&A
Performance (e.g., MTEB) High accuracy in multiple domains Competitive with smaller models; lower than OpenAI
Fine-tunable? ❌ Not publicly βœ… Yes, if you host the model yourself
Integration Simplicity βœ… Easy via OpenAI SDK Moderate (requires Ollama or local inference setup)
Cost Pay-per-use (token-based via OpenAI API) Free (run on your own hardware)
Privacy/Security Data goes to OpenAI servers Full local control over data
Community / Ecosystem Strong OpenAI ecosystem (LangChain, etc.) Growing OSS support (Atlas, LangChain, Ollama)

Let me know if you want a downloadable Markdown or CSV version of this comparison.