Qwen3 - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki

Important Facts

  • Available with a variety of parameter counts (0.6, 1.7, 4, 8, 14, 30, 32 und 235 billion parameters)
  • Optimized for coding and agentic capabilities
  • Features a 40K context window.
  • Released under the Apache 2.0 License.
  • Scored an avg of 75% on my two runs of the humaneval benchmark

🔍 Overview

Qwen3 was pre-trained on an extensive dataset of approximately 36 trillion tokens, encompassing 119 languages and dialects. It is the latest generation in the Qwen family of large language models, introducing a hybrid approach to problem-solving with both a "Thinking Mode" for complex challenges and a "Non-Thinking Mode" for rapid responses.


🔧 Key Features

  • Hybrid Problem-Solving: Seamlessly switches between a "Thinking Mode" for complex logical reasoning, mathematical problems, and coding tasks, and a "Non-Thinking Mode" for efficient general-purpose dialogue.
  • Enhanced Reasoning: Designed with significantly improved reasoning capabilities to handle intricate problems.
  • Superior Human Preference Alignment: Optimized to align better with human preferences, leading to more natural and helpful interactions.
  • Agent Capabilities: Exhibits strong expertise in agent functionalities, allowing it to perform multi-step programming tasks and algorithmic challenges.
  • Extensive Multilingual Support: Offers robust capabilities for multilingual instruction following and translation across over 100 languages and dialects.
  • Diverse Model Suite: Includes a comprehensive range of dense models and Mixture of Experts (MoE) models, providing flexibility for various deployment scenarios and resource considerations.
  • Large Context Window: Supports a 40K context window, enabling the model to process longer and more complex inputs.

🧠 Architecture

Qwen3 employs a standard transformer-based architecture, which has been further optimized through a rigorous pre-training process. This process was structured in three distinct stages: initially focusing on building basic language skills, followed by the integration of knowledge-intensive data, and culminating in the extension of the context length to 32K tokens during its initial development (later expanded to 40K). The architecture supports both its efficient Mixture of Experts (MoE) models and its diverse dense models, designed to deliver high performance across various computational demands.


Qwen3 Blog Post by Developer

Qwen3 on Ollama Library