Phi‑4 Reasoning - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki
Phi‑4 Reasoning is a MIT-licensed 14 billion‑parameter, decoder‑only Transformer model from Microsoft Research, fine‑tuned for complex, multi‑step reasoning across domains such as math, science, coding, planning, and spatial tasks. It outputs explicit chain‑of‑thought reasoning followed by concise answers.
- Base model: Built on Phi‑4, a 14 B dense Transformer pre-trained with a data‑centric focus on high‑quality synthetic and organic datasets.
- Phi‑4 Reasoning: Fine‑tuned via supervised learning using "teachable" prompts and reasoning chains.
- Phi‑4 Reasoning‑Plus: Further enhanced with reinforcement learning (GRPO), optimized for longer reasoning traces and better accuracy.
- Context window: Supports up to 32K tokens, allowing in-depth problem exploration.
-
Logic markers: Uses
<think> … </think>
blocks to structure reasoning steps.
-
Supervised fine-tuning:
- Trained on ~1.4 million high‑quality reasoning examples.
- Balanced curriculum targeting borderline‑solvable prompts.
- Adjusted hyperparameters: small batch sizes, moderate learning rates, rotary position embeddings.
-
Reinforcement learning (Plus variant):
- Added RL phase using ~6,400 math‑focused problems.
- Employed GRPO reward: +1 for correct, –0.5 for incorrect, penalized hallucinations.
- Results in ~50% longer outputs, ~15% AIME performance boost.
Task / Benchmark | Phi‑4 | Phi‑4 Reasoning | Reasoning‑Plus |
---|---|---|---|
AIME 25 | 63.1 % | 78.0 % | 82.5 % |
HMMT Feb 25 | 43.8 % | 53.6 % | 67.5 % |
OmniMath | 76.6 % | 81.9 % | 85.0 % |
GPQA | 67.1 % | 69.3 % | 77.7 % |
LiveCodeBench | 53.8 % | 53.1 % | 68.8 % |
- Transparency: Chain‑of‑thought outputs enhance interpretability and debuggability.
- Efficiency: Combines high performance with manageable compute and memory footprint.