TinyLlama: A Compact 1.1B Parameter Language Model - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki
TinyLlama is an open-source, compact language model developed by the StatNLP Research Group at the Singapore University of Technology and Design. Built upon the Llama 2 architecture, TinyLlama comprises 1.1 billion parameters and was pretrained on approximately 1 trillion tokens over three epochs. Despite its smaller size, it demonstrates competitive performance on various natural language processing benchmarks.
Model Architecture
- Base Architecture: Transformer decoder-only, consistent with Llama 2.
- Positional Encoding: Rotary Positional Embeddings (RoPE).
- Normalization: RMSNorm.
- Activation Function: SwiGLU.
- Attention Mechanism: Grouped-query attention with FlashAttention 2 for enhanced efficiency.
Performance Highlights
- Commonsense Reasoning: Achieved the highest average scores among evaluated models.
- Problem-Solving Tasks: Demonstrated superior performance on the InstructEval Benchmark.
- Inference Speed: Optimized for efficient deployment on edge devices.
Training Details
- Training Data: A mixture of natural language data from SlimPajama and code data from Starcoderdata.
- Training Duration: Approximately 90 days using 16 A100-40G GPUs.
- Training Techniques: Utilized Fully Sharded Data Parallel (FSDP), FlashAttention, and xFormers for optimized performance.
Usage and Deployment
TinyLlama is designed for easy integration into applications requiring efficient language models with a smaller computational footprint. The model weights and code are publicly available, and there is an interactive chat demo for exploration.