Llama - AshokBhat/ml GitHub Wiki
About
Models
Name | Release date | Parameters (in B) | Context length | Corpus size | Commercial viability? |
---|---|---|---|---|---|
Llama 4 (Maverick) | Apr, 25 | 17 (active), 400 (total), 128 experts | 1M (Instruct), 256K (pretrain) | 40T | Yes |
Llama 4 (Scout) | Apr, 25 | 17 (active), 109 (total), 16 experts | 10M (Instruct), 256K (pretrain) | 40T | Yes |
Llama 3.2 | Sep, 24 | 1, 3, 11, 90 | 128K | >15T | Yes |
Llama 3.1 | Jul, 24 | 8, 70, 405 | 128K | >15T | Yes |
Llama 3 | Apr, 24 | 8, 70 | 8K | 15T | Yes |
Llama 2 | Jul, 23 | 7, 13, 70 | 4K | 2T | Yes |
Llama | Feb, 23 | 7, 13, 33, 65 | 2K | 1–1.4T | No |
Llama 4 Key Features:
- Mixture-of-Experts (MoE) architecture (Maverick: 128 experts, Scout: 16 experts)
- Native multimodal input (text and images)
- Dramatically increased context length (up to 10 million tokens for Scout, 1 million for Maverick)
- Trained on 40 trillion tokens, supporting 200 languages (with special fine-tuning for 12 major languages)
- Designed for both high performance (Maverick) and efficient deployment (Scout, fits on a single server-grade GPU)