Lists on AI & LLM Tech - GRibbans/Gribbans GitHub Wiki

This wiki page for useful links on Generative A.I. technology, systems, tools, and research & analysis.
Unless stated, all detailed are Open-source software (OSS/FOSS)
'*' = marks those I use and would recommend taking a look at.
Table of Contents
AI Models
LLM Leader-boards
- Aider Leader-board Aider is a well regarded code assistant; the ranking table is focused towards this use case.
- LMSYS Leader-board Ranking is based of blind side-bys-de human evaluation. So is a good ranking of purely human opinion, so covers intrinsically, many aspects which we value in an AI model.
- Bigcode Bench Evaluates LLMs with practical and challenging programming tasks.
Table of LLM Models
Below is a list of models which I have come across that have been noted by others (usually from Huggingface or Reddit). All LLM models below have been released to the public, and can be hosted locally.
Embedding models are specialist AI models dedicated to creating embeddings. They are used with Vector databases and the RAG process._
LLM NAME |
PARAMETER |
FILE SIZE |
CONTEXT SIZE |
RELEASE DATE |
DESCRIPTION |
Granite Code 3B |
3B |
2.0GB |
2K |
May-24 |
By IBM. Best for code completion tasks |
Granite |
8B |
4.6GB |
8.192K |
May-24 |
By IBM. For code generation, code explanation, code fixing etc. |
Llama3.1 |
8B |
4.7GB |
128K |
Jul-24 |
By Meta. 70B (40GB) and 405B (231GB) also available |
Llama3 Instruct |
8B |
4.7GB |
- |
- |
By Meta. 8B is the budget mode from its latest release |
Mistral 7B |
7.3B |
4.1GB |
4k |
Sep-23 |
General purpose AI |
Mistral NeMo |
12B |
7.1GB |
128K |
Jul-24 |
General purpose AI |
Openhermes v2.5 |
7.24B |
4.1GB |
8.192K |
Feb-23 |
Fine-tuned by Teknium on Mistral with fully open datasets |
Phi 3.5 |
3.82B |
2.2GB |
128K |
Aug-24 |
By Microsoft. Very safe. Lightweight, state-of-the-art open model |
Phi 3.1 Mini 128K |
3.8B |
1.8GB |
128K |
July-24 |
By Microsoft. Additional post-training data massive improvements across a range of benchmarks |
Phi 3 Mini 4K |
3.8B |
2.3GB |
4K |
- |
By Microsoft. Update to the Phi 3 model. |
Phi 3 Small |
3.8B |
2.3GB |
- |
- |
Microsofts new small, but powerful (equiv. GPT 3.5) |
Qwen2 |
7B / 1.5B |
4.4GB |
- |
July-24 |
Range of models. 1.5B model suitable for IDE code completion |
Wizard LM2 |
7B |
4.1GB |
- |
May-24 |
By Microsoft. Previous fastest model (worst reasoning in WLM2 range) |
Yi Coder |
8.83B |
5.0GB |
128K |
Sep-24 |
Current no.1, state-of-the-art coding performance, fewer than 10 b parameters |
Yi v1.5 |
6B |
3.5GB |
- |
May-24 |
OSS LLM from 01.AI, it was trained on 3 trillion tokens of data |
:-------------------------------------------------------------------- |
--------- |
--------- |
------------ |
------------ |
:----------------------------------------------------------------------------- |
Autocoder |
6.7B |
7.2GB |
- |
- |
Its highest test score is for coding in: Python |
CodeGeex4 |
9B |
5.5GB |
128k |
Jul-24 |
Currently it is highest scoring sub 10B param model for coding |
CodeGemma 2B |
2B |
1.6GB |
8K |
May-24 |
By Google. 2B model is ideal for IDE code auto-completion |
Codegemma Instruct |
7B |
5.0GB |
- |
- |
- |
CodeQwen 2.0 |
7B |
4.4GB |
- |
- |
Its highest test score is for coding in: JS |
Codestral |
22B |
13GB |
- |
- |
VG for Python. Mistrals first model focused on code gen. N.B. needs GPU |
Deepseek v2.5 |
236B |
N/A |
128K |
Sept-24 |
NOT LOCAL HOST. Best value, low cost and top tier for coding |
DeepSeek-Coder-V2.1(0724) |
16B |
8.9GB |
128k |
June-24 |
Aider.chat ranks it second best model for coding related tasks. as at release |
Deepseek-coder v2 |
16B |
8.9GB |
- |
July-24 |
Now former best coding model (July 2024), requires a large capacity GPU |
Deepseek-coder |
6.7B |
3.8GB |
- |
- |
- |
:-------------------------------------------------------------------- |
--------- |
--------- |
------------ |
------------ |
:----------------------------------------------------------------------------- |
MiniCPM v2.6 |
8B |
5.5GB |
N/A |
Aug-24 |
Vision model. Multimodal LLMs (MLLMs) designed for vision-language |
:-------------------------------------------------------------------- |
--------- |
--------- |
------------ |
------------ |
:----------------------------------------------------------------------------- |
All-minilm |
22M |
45MB |
- |
- |
Embedding (RAG) model |
Mxbai-embed-large |
335M |
669MB |
- |
- |
Embedding (RAG) model |
Nomic-embed-text |
137M |
274MB |
- |
- |
Embedding (RAG) model |
N.B. mxbai needs a custom system prompt 'Represent this sentence for searching relevant passages: A man is eating a piece of bread'.
End-User Applications (LLM Interaction)
Desktop
SYSTEM NAME |
DESCRIPTION |
AI Chat |
CLI tool featuring Chat-REPL, Shell Assistant, RAG, AI tools & agents |
AnythingLLM * |
Desktop UI with RAG capabilities for local LLM |
ChatGPTerminator |
- |
Danswer AI |
Self hostable AI assistant (OSS) |
Fabric |
AI assistant with crowd sourced prompt patterns |
Kotaemon |
RAG-based (load your own documents to embed ready for AI use) chat app. Supports local LLM. |
Msty |
Top tier. Desktop app for LLM chat, add documents, transcribe audio to text.z |
PromptMixer * |
Desktop UI with decent prompt text management |
ShellGPT |
Add LLM access and use within the terminal |
Verba |
Desktop UI with RAG capabilities to local LLM chat |
Web App
Coding Assistants
Standalone tools, and IDE extensions (code autocompletion) etc..
SYSTEM NAME |
DESCRIPTION |
Aider |
A Tier. Terminal CLI, pair program with LLMs, builds a tree map of the repo to improve AI responses |
Amazon Q Developer |
Amazons developer assistant, use in VSCodium and derivatives |
CodeGPT |
Extension for VSCode allows code with AI |
CodiumAI |
Spidermen 👇 pointing meme. Different companies, very similarly named |
Codeium |
Spidermen 👆 pointing meme. Different companies, very similarly named |
Git Co-Pilot |
by Git and Microsoft |
Claude Dev |
A Tier |
Cody |
by Sourcegraph |
Continue Dev * |
A Tier. IDE extension. Use local LLM e.g. Ollama, and remote LLMs |
Cursor IDE * |
Inbuild AI for autocomplete, access external AI via API including Ollama local models |
Devika |
|
Gemini UI-to-Code |
Streamlit app to convert images of UI designs into code |
Google Code Transformer |
Very competant model, free access level is generous |
Omni Engineer |
|
Pear AI |
IDE with built-in AI interaction |
RapidPages |
- |
Sourcery |
Python, JS and TS AI code assistant, free for public open source |
Tabby |
Self-hosted AI coding assistant (OSS) |
Tabnine * |
Independent company, one of the first. Free autocompletion functionality for public repos |
Twinny * |
A Tier. Private, code-completion plugin for VSCode, only uses local hosted LLMs (OSS) |
Vanna |
Python RAG (Retrieval-Augmented Generation) framework for SQL generation (OSS) |
Zed AI |
|
Application Development Platforms with LLM
Locally Hosted
SYSTEM NAME |
DESCRIPTION |
Agenta |
LLM development platform, docker hostable |
AgentScope |
Build multi-agent applications (OSS) |
ChainForge * |
LLM prompt engineering tool (OSS) |
CodeAct Agent |
Coding agent, uses Mistral |
Cognita |
RAG framework for building modular applications (OSS) |
CoPilotKit |
Incorporate AI into custom applications (OSS) |
CrewAI |
Multi-agent automation |
DSpy |
Python framework for algorithmically optimizing LM prompts and weights |
Flowise |
Low code LLM application builder (OSS) |
GorillaLLM |
API 'link list' for LLM and Agents to access as 'function calls' |
GPTCache |
Semantic cache for LLMs |
LAgent |
Lightweight framework for LLM-based agent development |
Ollama Grid Search |
Desktop app supporting LLM evaluation of models, prompts, inferencing |
OpenAOE |
Chat with multiple LLMs at the same time aka LLM group chat |
OpenDevin |
Autonomous app.dev agent |
OpenPrompt |
Add Python library |
Promptflow |
Dev tools for E2E creation of LLM-based AI apps |
PromptFoo |
Testing system inc. triggered in CI/CD, for the process of LLM evaluation (OSS) |
Phidata |
framework for building AI Agents with memory storage, contextual knowledge |
Tasking AI |
LLM application development |
3rd Party Hosted
SYSTEM NAME |
DESCRIPTION |
Amazon Bedrock |
Build with foundation models |
Dify.ai |
LLM app dev platform with RAG, Agents, and LLMOps process |
Klu |
Collaborative prompt engineering |
Lambda AI Stack |
AI training envs, managed upgrades for: PyTorch®, TensorFlow, CUDA, cuDNN etc. |
LangChain |
A framework to easily construct LLM‑powered apps. |
LangSmith |
Build LLM powered applications |
Lightning AI Studio |
Run AI from external resource, zero setup, 22 Free GPU hours/month |
LLMOPS |
Evaluate LLM output |
Nebius |
- |
Promptchainer |
Visual flow builder for AI flow creation, chain prompts |
Promptmetheus.com |
- |
Promptlayer |
Manage prompts, evaluate models and oversight of chat usage |
Release.ai |
AI Development and Deployment platform for private AI apps |
Restack |
- |
Together AI |
- |
VectorShift |
No-Low Code to build AI focused apps and workflows |
LLM Training and Fine-tuning
SYSTEM NAME |
DESCRIPTION |
Amazon SageMaker |
Build, train, and deploy machine learning models at scale |
Lightning AI Studio |
Cloud hosted e2e AI services, training to LLMApp dev. Zero setup, 22 Free GPU hours/month |
NanoGPT |
Lightweight system for training and fine-tuning up to medium size GPT |
NVIDIA AI Workbench |
Free to access LLMApp development, AI training/tuning |
LLM Infrastructure
System Development (Local Host)
SYSTEM NAME |
DESCRIPTION |
Autogen |
Microsoft framework to create multiple agents with LLM access |
LMDeploy |
Deploy and serve LLMs. Test data faster than vLLM |
Ollama Python library |
Integrate Python 3.8+ projects with Ollama |
RouteLLM |
Framework for serving and evaluating LLM routers |
Semantic text splitter |
Python chunking by semantics, for RAG method |
Semchunk |
Semantic chunking of text (used in LLM RAG method) |
Text-splitter |
Semantic chunking of text (used in LLM RAG method) |
Vector Admin * |
Manage datasets within LLM RAG db instances |
vLLM |
High-throughput, memory-efficient inference engine. Journal Paper |
Systems Development (Remote Host)
SYSTEM NAME |
DESCRIPTION |
Embedchain |
Supports creation of RAG LLMApp. Create embedded data 'chunking', stores in vector db |
Local LLM Server Hosting
SYSTEM NAME |
DESCRIPTION |
Gollama |
Dashboard for managing local hosted Ollama models |
gpt4all.cpp |
Run "Assistant-Tuned Chat-Style LLM". I have not used. |
LocalAI |
OSS backend |
llama.cpp |
For models coming from Meta, and its fine-tuned derivatives. |
LM Studio * |
OSS backend |
NVIDIA ChatRTX |
LLM on a GFX card |
Oobabooga |
OSS backend |
Ollama * |
LLM host backend, as of v0.2 (July) concurrency is enabled! Run multiple models in parallel, great for agentic setup (OSS) |
OpenLLM |
Run OSS LLM, OpenAI-compatible API endpoints |
Remote LLM Server Hosting
Vector Databases (for RAG)
Vector databases are required for Retrieval-Augmentation-Generation (RAG), this methodology uses embedding models to processing documents/files, with the resulting file stored in Vector databases._
N.B. LLMs using RAG, are called Retrieval-augmented language models (RALMs)
SYSTEM NAME |
DESCRIPTION |
Chroma Db |
Local bare-metal or container (OSS) |
Pinecone |
(OSS) |
LanceDb |
Embedded vector db (OSS), good for local storage by a LLMApp |
Milvus |
Easily installed (via PIP) vector database (OSS) |
OpenSearch |
Combines vector with traditional lexical, and hybrid search and analytic (OSS) |
PgVector |
Vector model addon to PostgreSQL |
Vespa |
Db allows distributed inference, plus organize vectors, tensors, text, and structured data. |
To-do List
Completed To-do