embedding model - chunhualiao/public-docs GitHub Wiki
benchmark datasets (e.g., MTEB)
Enter your LLM Embedding Model Choices are:
- openai
- azureopenai
- Embeddings available only with OllamaEmbedding:
- llama2
- mxbai-embed-large
- nomic-embed-text
- all-minilm
- stable-code
- bge-m3
- bge-large
- paraphrase-multilingual
- snowflake-arctic-embed
- Leave blank to default to 'BAAI/bge-small-en-v1.5' via huggingface
DeepWiki-Open's Choices
Hereβs a side-by-side comparison table for text-embedding-3-small
(OpenAI) and nomic-embed-text
(Ollama):
Aspect | text-embedding-3-small (OpenAI) |
nomic-embed-text (Ollama) |
---|---|---|
Provider | OpenAI | Nomic AI / Ollama |
License | Proprietary (OpenAI) | Open-source (Apache 2.0 or similar) |
Runs Locally | β No (cloud API only) | β Yes (via Ollama) |
Requires Internet/API Key | β Yes | β No |
Model Size / Latency | Small model, low latency on OpenAI infra | Small to medium (384β768 dim), fast on CPU/GPU |
Embedding Dimension | 1536 | Typically 384 or 768 |
Languages Supported | Multilingual (~100+) | Multilingual (fewer languages tested) |
Typical Use Cases | Semantic search, RAG, Q&A, classification | Local RAG, semantic search, private Q&A |
Performance (e.g., MTEB) | High accuracy in multiple domains | Competitive with smaller models; lower than OpenAI |
Fine-tunable? | β Not publicly | β Yes, if you host the model yourself |
Integration Simplicity | β Easy via OpenAI SDK | Moderate (requires Ollama or local inference setup) |
Cost | Pay-per-use (token-based via OpenAI API) | Free (run on your own hardware) |
Privacy/Security | Data goes to OpenAI servers | Full local control over data |
Community / Ecosystem | Strong OpenAI ecosystem (LangChain, etc.) | Growing OSS support (Atlas, LangChain, Ollama) |
Let me know if you want a downloadable Markdown or CSV version of this comparison.