Llama - AshokBhat/ml GitHub Wiki

About

  • Llama (Large Language Model Meta AI)
  • LLM Foundation Model by Meta

Models

Name Release date Parameters (in B) Context length Corpus size Commercial viability?
Llama 4 (Maverick) Apr, 25 17 (active), 400 (total), 128 experts 1M (Instruct), 256K (pretrain) 40T Yes
Llama 4 (Scout) Apr, 25 17 (active), 109 (total), 16 experts 10M (Instruct), 256K (pretrain) 40T Yes
Llama 3.2 Sep, 24 1, 3, 11, 90 128K >15T Yes
Llama 3.1 Jul, 24 8, 70, 405 128K >15T Yes
Llama 3 Apr, 24 8, 70 8K 15T Yes
Llama 2 Jul, 23 7, 13, 70 4K 2T Yes
Llama Feb, 23 7, 13, 33, 65 2K 1–1.4T No

Llama 4 Key Features:

  • Mixture-of-Experts (MoE) architecture (Maverick: 128 experts, Scout: 16 experts)
  • Native multimodal input (text and images)
  • Dramatically increased context length (up to 10 million tokens for Scout, 1 million for Maverick)
  • Trained on 40 trillion tokens, supporting 200 languages (with special fine-tuning for 12 major languages)
  • Designed for both high performance (Maverick) and efficient deployment (Scout, fits on a single server-grade GPU)

See also