Gemma 3 - chunhualiao/public-docs GitHub Wiki

Model Size	Parameters	Context Window	Multimodal	Languages Supported	Key Benchmarks (27B IT)	Memory (4-bit to 32-bit)
1B	1B	32K tokens	Text-only	140+	-	~2GB to ~4GB
4B	4B	128K tokens	Text+Image	140+	-	~6GB to ~12GB
12B	12B	128K tokens	Text+Image	140+	-	~18GB to ~36GB
27B	27B	128K tokens	Text+Image	140+	MATH: 89.0%, MMLU-Pro: 67.5%, LiveCodeBench: 29.7%	~40GB to ~80GB

Performance:

The 27B IT model achieves a 1338 Elo score in LMSys Chatbot Arena, ranking 9th and outperforming larger models like LLaMA 3.1 405B (1269), DeepSeek-V3 (1318), and Qwen2.5-72B (1257). It’s comparable to Gemini 1.5 Pro on many benchmarks.
Excels in tasks like question answering, summarization, reasoning, and coding, with ~15% improvement over Gemma 2 in math and reasoning tasks.
Specific benchmarks: MATH (89.0%), LiveCodeBench (29.7%), MMLU-Pro (67.5%).

Fine-tuning 27 B modes

Yes, there have been successful experiences in fine-tuning the Gemma 27B model across various domains. Here are some notable examples:

⸻

TxGemma: Therapeutic Applications

Researchers developed TxGemma, a suite of models including a 27B variant fine-tuned from Gemma-2, tailored for therapeutic development tasks. These tasks encompass clinical trial adverse event prediction, molecular property analysis, and scientific reasoning. The fine-tuned models demonstrated superior or comparable performance to state-of-the-art models on numerous tasks. Notably, fine-tuning required less training data compared to base LLMs, making it suitable for data-limited applications.

⸻

BgGPT: Bulgarian Language Specialization

The BgGPT project involved fine-tuning Gemma-2 27B to enhance Bulgarian language understanding and generation. By leveraging over 100 billion tokens of Bulgarian and English text, the fine-tuned model achieved strong performance in Bulgarian language tasks while maintaining its English capabilities. The methodology included continual learning strategies and careful data curation.

⸻

These examples indicate that fine-tuning Gemma 27B models can be effective for both specialized domains and language-specific applications. However, such endeavors typically require substantial computational resources and expertise. If you’re considering fine-tuning Gemma 27B for your own projects, it’s advisable to assess your hardware capabilities and explore parameter-efficient fine-tuning methods like LoRA or QLoRA to optimize resource usage.

Let me know if you need further information on fine-tuning strategies or resource requirements.