Llama - AshokBhat/ml GitHub Wiki

About

Name	Release date	Parameters (in B)	Context length	Corpus size	Commercial viability?
Llama 4 (Maverick)	Apr, 25	17 (active), 400 (total), 128 experts	1M (Instruct), 256K (pretrain)	40T	Yes
Llama 4 (Scout)	Apr, 25	17 (active), 109 (total), 16 experts	10M (Instruct), 256K (pretrain)	40T	Yes
Llama 3.2	Sep, 24	1, 3, 11, 90	128K	>15T	Yes
Llama 3.1	Jul, 24	8, 70, 405	128K	>15T	Yes
Llama 3	Apr, 24	8, 70	8K	15T	Yes
Llama 2	Jul, 23	7, 13, 70	4K	2T	Yes
Llama	Feb, 23	7, 13, 33, 65	2K	1–1.4T	No

Mixture-of-Experts (MoE) architecture (Maverick: 128 experts, Scout: 16 experts)
Native multimodal input (text and images)
Dramatically increased context length (up to 10 million tokens for Scout, 1 million for Maverick)
Trained on 40 trillion tokens, supporting 200 languages (with special fine-tuning for 12 major languages)
Designed for both high performance (Maverick) and efficient deployment (Scout, fits on a single server-grade GPU)