DeepSeek R1:reproduction - chunhualiao/public-docs GitHub Wiki

Top Open-Source Projects Reproducing DeepSeek-R1

DeepSeek-R1 is a large-scale reasoning model (671B parameters) released by DeepSeek AI in January 2025, achieving performance comparable to OpenAI's o1 on benchmarks like MATH, AIME, and GPQA through a pipeline involving supervised fine-tuning (SFT) on cold-start data, reinforcement learning (RL) for pattern discovery, and distillation into smaller models. While DeepSeek-R1 itself is open-sourced (MIT license, with weights on Hugging Face), the query focuses on community-driven reproductions that aim to replicate its training pipeline, data generation, or RL methods from scratch for transparency, accessibility, and further innovation. These projects emphasize reproducibility, often targeting smaller scales to validate claims or extend to new domains like agents or search.

Based on recent community efforts (as of September 2025), here are the best open-source projects reproducing DeepSeek-R1. "Best" is evaluated by factors like completeness of the pipeline replication, benchmark validation (e.g., matching within 1-3% on GPQA), community engagement (stars/forks on GitHub), innovation (e.g., cost reductions or extensions), and openness (full code/data release). I've prioritized projects with active development and verifiable results.

Project	Description	Key Features & Achievements	GitHub Stars (as of Sep 2025)	Why It's Among the Best
Open-R1 (Hugging Face)	A systematic reconstruction of DeepSeek-R1's full pipeline: synthetic data generation via distillation from R1, SFT on verified Chain-of-Thought (CoT) traces, and pure RL (e.g., PPO/GRPO variants) without proprietary elements. Starts from base models like Llama-3.2 or Qwen-2.5.	- Releases "Mixture-of-Thoughts" dataset (350k verified traces for math/code/science).- Reproduces R1-Distill-7B with recipes for multi-GPU training (DeepSpeed ZeRO-3).- Matches GPQA Diamond within 1-3 std devs; supports decontamination for fair evals.- Roadmap: RL for R1-Zero and multi-stage scaling.	~12k	Most comprehensive and collaborative; community-driven (e.g., integrates with vLLM for inference). Validates core claims like RL's role in out-of-domain generalization. Active since Jan 2025 launch.
DeepScaleR (Berkeley AI Research / rllm-org)	Focuses on scaling RL for small models to beat o1-preview on math (e.g., 1.5B model via simple outcome-based rewards). Replicates R1-Zero's RL-from-base approach on 40k math problems, with extensions to longer contexts (up to 24k tokens).	- Trained on 3,800 A100 hours (~$4,500 total cost).- Surpasses o1-preview on MATH-500 (95%+ accuracy).- Includes VERL library for RL training; full dataset/code open.- Handles safety issues like reward hacking.	~8k	Extremely cost-effective (95% cheaper than proprietary equivalents); proves RL efficacy on tiny models. Ideal for researchers validating R1's "pure RL" stage. Released Feb 2025.
RAGEN Framework	First reproduction tailored for agentic models: Applies R1's RL pipeline to train LLMs (e.g., 3B Qwen/Llama) on tool-calling and multi-hop reasoning, mimicking o1's "deep research" with search augmentation.	- Supports verifiable rewards (e.g., code execution) and difficulty ramping.- Outperforms baselines on agent benchmarks (e.g., 2x better multi-hop queries).- Integrates with SmolAgents/LiteLLM; Apache 2.0 license.	~5k	Innovates on R1 for real-world agents (e.g., search+reasoning); addresses R1's limitations in tool use. Community favorite for practical extensions. Launched Feb 2025.
Search-R1 (Bowen Jin / UIUC)	Replicates R1-Zero for search-augmented agents: Trains 3B models to self-discover reasoning+search patterns via RL, targeting an open "Deep Research" alternative to o1/o3.	- Uses rule-based rewards for logic/search tasks; beats GPT-4o on puzzles (0.41 vs 0.3 accuracy after 300 steps).- Full open-source code/logs; focuses on harder samples for efficiency.- Extends to multi-node training.	~4k	Strong on generalization (e.g., RL improves unrelated tasks like poetry). Bridges R1 to agentic/search apps. Released late Feb 2025.
TinyZero (Jiayi Pan / Berkeley)	Lightweight RL reproduction of R1-Zero for specific domains (e.g., math puzzles like Countdown/multiplication). Starts from distilled R1-1.5B, using simple verifiable rewards.	- Trained for ~$30-50; beats o1-preview on targeted math.- Emphasizes decontamination and length/difficulty biases.- Includes value functions (e.g., VC-PPO) for stability.	~3k	Ultra-accessible for hobbyists/low-resource setups; validates R1's core RL without massive compute. Great for domain-specific tweaks. Released early 2025.

Additional Notes

Trends from Community Efforts: A September 2025 survey of 100+ replication studies highlights successes like high-quality CoT data for SFT and simple rewards in RL, but challenges include dataset contamination and RL instability on <32B models. Projects like Open-R1 lead in addressing these.
Getting Started: Most use Hugging Face for weights/datasets and tools like DeepSpeed/vLLM for training/inference. For full reproducibility, start with Open-R1's recipes (e.g., accelerate launch src/open_r1/sft.py on Mixture-of-Thoughts).
Limitations: Smaller reproductions (<7B) struggle with full-scale R1 performance but excel in cost/accessibility. No project yet fully matches the 671B R1 on all benchmarks, but they're closing the gap rapidly.

These projects democratize R1's innovations, enabling anyone to build better reasoning models. If you're focusing on a specific aspect (e.g., agents or low-cost), I recommend Open-R1 for breadth or DeepScaleR for math-focused starts.