TinyLlama: A Compact 1.1B Parameter Language Model - amosproj/amos2025ss04-ai-driven-testing GitHub Wiki

TinyLlama is an open-source, compact language model developed by the StatNLP Research Group at the Singapore University of Technology and Design. Built upon the Llama 2 architecture, TinyLlama comprises 1.1 billion parameters and was pretrained on approximately 1 trillion tokens over three epochs. Despite its smaller size, it demonstrates competitive performance on various natural language processing benchmarks.

Model Architecture

Base Architecture: Transformer decoder-only, consistent with Llama 2.
Positional Encoding: Rotary Positional Embeddings (RoPE).
Normalization: RMSNorm.
Activation Function: SwiGLU.
Attention Mechanism: Grouped-query attention with FlashAttention 2 for enhanced efficiency.

Performance Highlights

Commonsense Reasoning: Achieved the highest average scores among evaluated models.
Problem-Solving Tasks: Demonstrated superior performance on the InstructEval Benchmark.
Inference Speed: Optimized for efficient deployment on edge devices.

Training Details

Training Data: A mixture of natural language data from SlimPajama and code data from Starcoderdata.
Training Duration: Approximately 90 days using 16 A100-40G GPUs.
Training Techniques: Utilized Fully Sharded Data Parallel (FSDP), FlashAttention, and xFormers for optimized performance.

Usage and Deployment

TinyLlama is designed for easy integration into applications requiring efficient language models with a smaller computational footprint. The model weights and code are publicly available, and there is an interactive chat demo for exploration.

Citation

arXiv Paper: TinyLlama: An Open-Source Small Language Model

GitHub Repository: jzhang38/TinyLlama