embedding model - chunhualiao/public-docs GitHub Wiki

benchmark datasets (e.g., MTEB)

Enter your LLM Embedding Model Choices are:

DeepWiki-Open's Choices

Here’s a side-by-side comparison table for text-embedding-3-small (OpenAI) and nomic-embed-text (Ollama):

Aspect	`text-embedding-3-small` (OpenAI)	`nomic-embed-text` (Ollama)
Provider	OpenAI	Nomic AI / Ollama
License	Proprietary (OpenAI)	Open-source (Apache 2.0 or similar)
Runs Locally	❌ No (cloud API only)	✅ Yes (via Ollama)
Requires Internet/API Key	✅ Yes	❌ No
Model Size / Latency	Small model, low latency on OpenAI infra	Small to medium (384–768 dim), fast on CPU/GPU
Embedding Dimension	1536	Typically 384 or 768
Languages Supported	Multilingual (~100+)	Multilingual (fewer languages tested)
Typical Use Cases	Semantic search, RAG, Q&A, classification	Local RAG, semantic search, private Q&A
Performance (e.g., MTEB)	High accuracy in multiple domains	Competitive with smaller models; lower than OpenAI
Fine-tunable?	❌ Not publicly	✅ Yes, if you host the model yourself
Integration Simplicity	✅ Easy via OpenAI SDK	Moderate (requires Ollama or local inference setup)
Cost	Pay-per-use (token-based via OpenAI API)	Free (run on your own hardware)
Privacy/Security	Data goes to OpenAI servers	Full local control over data
Community / Ecosystem	Strong OpenAI ecosystem (LangChain, etc.)	Growing OSS support (Atlas, LangChain, Ollama)

Let me know if you want a downloadable Markdown or CSV version of this comparison.