AI coding - ProkopHapala/FireCore GitHub Wiki
Update June 2025
- Continue-dev because not updated (3rd party models not up-to-date) => uninstall
- Cody has free Claude 4 sonet
AI model Comparisons / Leaderboards
Chat platforms
Free API providers
- Google Vertex
- 300$ free credit for 3 month (150$ without credit card)
- Google Studio
- Gemini 1.5_002 Pro / Flash with generous limits. Flash is super fast, Pro is quite good. Super big context window (1M tok)
- Gemini 1.5_002 Pro : 2 Req/min, 50 Req/day, 32,000 T/min
- Gemini 1.5_002 Flash : 15 RPM, 1,500 Req/day, 1 million TPM
- Gemini 1.5_002 Pro / Flash with generous limits. Flash is super fast, Pro is quite good. Super big context window (1M tok)
- Mistral
- AICodeKing: Mistral FREE API : This is the BEST FREE WAY to do AI CODING
- Codestral API (unlimited rate limit)
- Mistral 2 Large free api key (very generous rate limits)
- mistral-large-2407 1 Request/s, 500,000 T/minute, 1,000,000,000 T/month
- GitHub marketplace
- GPT-4o and GPT-4o-mini seems to be free at the moment
- hyperbolic
- 10$ free creadit
- cerebras
- Llama3.1-8B ( 2204 T/s )
- Llama3.1-70B ( 2525 T/s )
- sambanova
- Llama-3.1-8B ( 1111.59 T/s )
- Llama-3.1-70B ( 531.56 T/s )
- Llama-3.1-405B ( 124.95 T/s )
- Llama-3.2-90B-Vision ( 606.06 T/s )
- groq
- deepseek-r1-distill-llama-70b ( 279 T/s )
- llama3-groq-70b-8192-tool-use-preview ( 313 T/s )
- llama-3.1-70B-versatile ( 250 T/s )
- llama-3.2-90b-vision-preview ( 202 T/s )
Good Local Models
- Codestral 25
- Phi-4 14B
- qwq 32B
- qwen2.5-coder (0.5B, 1.5B, 3B, 7B, 14B, 32B)
- ollama/qwen2.5-coder
- aider leaderboards :
Qwen2.5-Coder-32B-Instruct
beatsgpt-4o-2024-05-13
andDeepSeek V2.5
, just behindclaude-3.5-sonnet&Haiku
- OpenCoder - Best 8B coding model (November 2024)
ollama
- Dolphin3.0-R1-Mistral-24B
- deepseek-r1
- 8b, 14b, 32b, 70b
- phi4
- qwq
- deepscaler - A fine-tuned version of Deepseek-R1-Distilled-Qwen-1.5B that surpasses the performance of OpenAI’s o1-preview with just 1.5B parameters on popular math evaluations.
Cost Effective Models
- qwen-2.5-coder-32b-instruct
DeepInfra 32,768 0.18/0.18 [$/Mtok I/O] 62.91 [tok/s] Latency 0.42s
- DeepSeek-2.5-236b
DeepSeek 8,192 0.14/0.28 [$/Mtok I/O] 15.49 [tok/s] Latency 1.10s
- DeepSeek thinking model (like OpenAI's o1)
- MiniMax-Text-01
- [AICodeKing:MiniMax-01: This OPENSOURCE Model HAS LONGEST 4M CONTEXT & BEATS OTHERS!] (https://www.youtube.com/watch?v=NKnRPykTIJs)
- MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE).
Coding Tools / Agents
- Aider: blog
- VS Code Extensions
- AI Toolkit for VS Code
- Cody
- https://www.continue.dev/
- Cline (Claude-dev)
- SuperMaven - code completion
- Editors
Aider Architect (more)
- Aider.chat
- Reasonable Architect settings
- Architect
- o1-preview ( best scientific reasoning, good coding )
- Claude-3.5-Sonet ( good scientific reasoning, best coding, stic to format )
- Gemini 1.5 Pro ( Good scientific reasoning, Good Context window )
- Coder - Aider-coder : google-doc
- GPT-4o via GitHub(Azure) - Fast, Good reasoning and coding (almost as Sonet)
- GPT-4o-mini
- DeepSeek ( good coding, but slow 16 T/s :-( )
- Gemini 1.5 Flash ( Fast, Decent coding )
- Mistral Large ( decent coding and reasoning, almost free at low rate )
- Grok beta ( 25$ free )
- Architect
- Free Model Options
- Fully Free (but with limits)
aider --model github/gpt-4o
aider --model gemini/gemini-1.5-pro-002 --map-tokens 2048
aider --model gemini/gemini-1.5-flash-002 --map-tokens 2048
aider --model mistral/mistral-large-latest --map-tokens 2048
aider --model codestral/codestral-latest --map-tokens 2048
aider --model cerebras/llama3.1-70b
- cheap (but not fully free)
aider --model openrouter/qwen/qwen-2.5-coder-32b-instruct
aider --model deepseek/deepseek-coder
- Fully Free (but with limits)
- Aider: usage
- Architect Model Options
- Separating code reasoning and editing
- Claude 3.5 sonet
aider --architect --sonnet --editor-model deepseek/deepseek-coder
aider --architect --sonnet --editor-model openrouter/qwen/qwen-2.5-coder-32b-instruct
aider --architect --sonnet --editor-model github/gpt-4o
aider --architect --sonnet --editor-model github/gpt-4o-mini
aider --architect --sonnet --editor-model mistral/mistral-large-latest
aider --architect --sonnet --editor-model codestral/codestral-latest
aider --architect --sonnet --editor-model gemini/gemini-1.5-flash-002
- GPT-4o
aider --architect --model github/gpt-4o --editor-model github/gpt-4o-mini
aider --architect --model github/gpt-4o --editor-model mistral/mistral-large-latest
aider --architect --model github/gpt-4o --editor-model openrouter/qwen/qwen-2.5-coder-32b-instruct
aider --architect --model github/gpt-4o --editor-model deepseek/deepseek-coder
aider --architect --model github/gpt-4o --editor-model gemini/gemini-1.5-flash-002
aider --architect --model github/gpt-4o --editor-model cerebras/llama3.1-70b
- Gemini 1.5-pro-002
aider --architect --model gemini/gemini-1.5-pro-002 --editor-model gemini/gemini-1.5-flash-002
aider --architect --model gemini/gemini-1.5-pro-002 --editor-model github/gpt-4o-mini
aider --architect --model gemini/gemini-1.5-pro-002 --editor-model openrouter/qwen/qwen-2.5-coder-32b-instruct
aider --architect --model gemini/gemini-1.5-pro-002 --editor-model deepseek/deepseek-coder
aider --architect --model gemini/gemini-1.5-pro-002 --editor-model mistral/mistral-large-latest
aider --architect --model gemini/gemini-1.5-pro-002 --editor-model codestral/codestral-latest
aider --architect --model gemini/gemini-1.5-pro-002 --editor-model cerebras/llama3.1-70b
- Gemini exp-1206
aider --model gemini/gemini-exp-1206
aider --architect --model gemini/gemini-exp-1206 --editor-model github/gpt-4o-mini
aider --architect --model gemini/gemini-exp-1206 --editor-model openrouter/qwen/qwen-2.5-coder-32b-instruct
- Gemini 2.0 flash
aider --model gemini/gemini-2.0-flash-exp
- Qwen QwQ
aider --model openrouter/qwen/qwq-32b-preview --editor-model openrouter/qwen/qwen-2.5-coder-32b-instruct --editor-edit-format editor-whole
- Architect Model Options
Youtube
- AICodeKing: 21 FREE AI Coding Tools THAT I USE
- AICodeKing: Mistral FREE API : This is the BEST FREE WAY to do AI CODING
- AICodeKing: Github Models FREE API : AI Coding with GPT-4O for FULLY FREE
Context Building
RAG for Coding
- Mistral RAG docs
- generative-ai/code_rag.ipynb : Code RAG - Reuse your already created codebase to generate more code
- Improving Retrieval Performance in RAG Pipelines with Hybrid Search
- Implementing RAG in Refact.ai AI Coding Assistant
- Hybrid search: RAG for real-life production-grade applications
- CodeRAG-Bench: Can Retrieval Augment Code Generation?
- Exploring the Combination of Full-Text Index with Cohere’s Reranker for RAG over a Knowledge Graph.
- Code Generation using Retrieval Augmented Generation + LangChain
- Building Hybrid Search Platforms: Combining Vector and Full-Text Search in RAG Pipelines
- https://python.langchain.com/docs/tutorials/local_rag/
- Advanced RAG techniques part 2: Querying and testing
- youtube : RAG for Code Generation, an AI Hacker Cup example
- youtube : Local GraphRAG with LLaMa 3.1 - LangChain, Ollama & Neo4j