Qwen - AshokBhat/ml GitHub Wiki
About
- LLM family built by Alibaba Cloud
- Native support for dual-mode operation (Thinking and Non-Thinking modes) on select reasoning models.
- Highly optimized for agentic frameworks, multimodal tasks, and mixture-of-experts (MoE) architectures.
Models & Formats
- Dense Sizes: 0.6B, 1.7B, 2B, 4B, 7B, 8B, 9B, 14B, 27B, 32B, 72B
- MoE Sizes: 30B-A3B, 35B-A3B, 122B-A10B, 235B-A22B, 397B-A17B, 480B-A35B
- Quantized Formats: Base, -Instruct, -Thinking, -AWQ, -GGUF, -GPTQ-Int4, -GPTQ-Int8, -FP8
Qwen Open-Source LLMs (Alibaba, Updated July 2026)
| Model Name | Release Date | Parameters (B) | Specialization / Notes |
|---|---|---|---|
| Qwen3.6-27B | Apr 2026 | 27 | Dense model, introduces "Thinking Preservation" across long context history. |
| Qwen3.6-35B-A3B | Apr 2026 | 35 total / 3 active | MoE architecture optimized heavily for advanced Agentic Coding workflows. |
| Qwen3.5-9B / 4B / 2B / 0.8B | Mar 2026 | 9, 4, 2, 0.8 | Lightweight, low-latency dense additions to the 3.5 generation. |
| Qwen3.5-122B-A10B | Feb 2026 | 122 total / 10 active | Mid-to-high tier MoE balancing deep logic and scaling efficiency. |
| Qwen3.5-35B-A3B | Feb 2026 | 35 total / 3 active | Early 3.5 generation MoE focused on unified text-multimodal capability. |
| Qwen3.5-27B | Feb 2026 | 27 | Dense variant featuring massive 201-language global linguistic coverage. |
| Qwen3.5-397B-A17B | Feb 2026 | 397 total / 17 active | Flagship 3.5 sparse MoE; near-100% multimodal training efficiency. |
| Qwen3-Coder-480B-A35B | Jul 2025 | 480 total / 35 active | SOTA repository-scale code generation; 256K context (extends to 1M). |
| Qwen3-235B-A22B | Apr/Jul 2025 | 235 total / 22 active | Original flagship Qwen3 MoE; pioneered dual-mode native thinking. |
| Qwen3-30B-A3B | Apr 2025 | 30 total / 3 active | Efficient original Qwen3 MoE reasoning model. |
| Qwen3 (Dense Suite) | Apr 2025 | 32, 14, 8, 4, 1.7, 0.6 | Standard dense tier ranging from local edge deployment to server utility. |
| QwQ-32B | Nov 2024 | 32 | Dedicated deep reasoning model; competitive with o1-mini / DeepSeek-R1. |
| Qwen2.5-Coder Series | Nov 2024 | 32, 14, 7, 3, 0.5 | Dedicated legacy code-generation foundation models. |
| Qwen2.5 Dense Series | Sep 2024 | 72, 14, 7 | General-purpose foundations that established the core Qwen framework. |
Note on Proprietary Versions: Larger flagship variants like Qwen3-Max, Qwen3.5-Plus, and Qwen3.6-Plus are accessible primarily as managed APIs via Alibaba Cloud Model Studio or the official chatbot UI and are excluded from the open-weights registry.
See also
- [GPT-4]] ](/AshokBhat/ml/wiki/[[Claude) | Gemini
- [LLaMa]] ](/AshokBhat/ml/wiki/[Gemma) | [QWEN]] | Mistral