Gemma 3 - chunhualiao/public-docs GitHub Wiki
Model Size | Parameters | Context Window | Multimodal | Languages Supported | Key Benchmarks (27B IT) | Memory (4-bit to 32-bit) |
---|---|---|---|---|---|---|
1B | 1B | 32K tokens | Text-only | 140+ | - | ~2GB to ~4GB |
4B | 4B | 128K tokens | Text+Image | 140+ | - | ~6GB to ~12GB |
12B | 12B | 128K tokens | Text+Image | 140+ | - | ~18GB to ~36GB |
27B | 27B | 128K tokens | Text+Image | 140+ | MATH: 89.0%, MMLU-Pro: 67.5%, LiveCodeBench: 29.7% | ~40GB to ~80GB |
Performance:
- The 27B IT model achieves a 1338 Elo score in LMSys Chatbot Arena, ranking 9th and outperforming larger models like LLaMA 3.1 405B (1269), DeepSeek-V3 (1318), and Qwen2.5-72B (1257). It’s comparable to Gemini 1.5 Pro on many benchmarks.
- Excels in tasks like question answering, summarization, reasoning, and coding, with ~15% improvement over Gemma 2 in math and reasoning tasks.
- Specific benchmarks: MATH (89.0%), LiveCodeBench (29.7%), MMLU-Pro (67.5%).
Fine-tuning 27 B modes
Yes, there have been successful experiences in fine-tuning the Gemma 27B model across various domains. Here are some notable examples:
⸻
- TxGemma: Therapeutic Applications
Researchers developed TxGemma, a suite of models including a 27B variant fine-tuned from Gemma-2, tailored for therapeutic development tasks. These tasks encompass clinical trial adverse event prediction, molecular property analysis, and scientific reasoning. The fine-tuned models demonstrated superior or comparable performance to state-of-the-art models on numerous tasks. Notably, fine-tuning required less training data compared to base LLMs, making it suitable for data-limited applications. 
⸻
- BgGPT: Bulgarian Language Specialization
The BgGPT project involved fine-tuning Gemma-2 27B to enhance Bulgarian language understanding and generation. By leveraging over 100 billion tokens of Bulgarian and English text, the fine-tuned model achieved strong performance in Bulgarian language tasks while maintaining its English capabilities. The methodology included continual learning strategies and careful data curation. 
⸻
These examples indicate that fine-tuning Gemma 27B models can be effective for both specialized domains and language-specific applications. However, such endeavors typically require substantial computational resources and expertise. If you’re considering fine-tuning Gemma 27B for your own projects, it’s advisable to assess your hardware capabilities and explore parameter-efficient fine-tuning methods like LoRA or QLoRA to optimize resource usage.
Let me know if you need further information on fine-tuning strategies or resource requirements.