TinyLlama: A Compact 1.1B Parameter Language Model - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki

TinyLlama is an open-source, compact language model developed by the StatNLP Research Group at the Singapore University of Technology and Design. Built upon the Llama 2 architecture, TinyLlama comprises 1.1 billion parameters and was pretrained on approximately 1 trillion tokens over three epochs. Despite its smaller size, it demonstrates competitive performance on various natural language processing benchmarks.

Model Architecture

  • Base Architecture: Transformer decoder-only, consistent with Llama 2.
  • Positional Encoding: Rotary Positional Embeddings (RoPE).
  • Normalization: RMSNorm.
  • Activation Function: SwiGLU.
  • Attention Mechanism: Grouped-query attention with FlashAttention 2 for enhanced efficiency.

Performance Highlights

  • Commonsense Reasoning: Achieved the highest average scores among evaluated models.
  • Problem-Solving Tasks: Demonstrated superior performance on the InstructEval Benchmark.
  • Inference Speed: Optimized for efficient deployment on edge devices.

Training Details

  • Training Data: A mixture of natural language data from SlimPajama and code data from Starcoderdata.
  • Training Duration: Approximately 90 days using 16 A100-40G GPUs.
  • Training Techniques: Utilized Fully Sharded Data Parallel (FSDP), FlashAttention, and xFormers for optimized performance.

Usage and Deployment

TinyLlama is designed for easy integration into applications requiring efficient language models with a smaller computational footprint. The model weights and code are publicly available, and there is an interactive chat demo for exploration.

Citation

arXiv Paper: TinyLlama: An Open-Source Small Language Model

GitHub Repository: jzhang38/TinyLlama