**Google Gemma 4 open-source multimodal models [developing]**
Key Questions
What is the Gemma 4 model family?
Gemma 4 is a family of open-source multimodal models from Google, including 2B/4B edge models, a 26B Mixture-of-Experts (MoE) model achieving 162 tokens/second on RTX, and a 31B model ranking #3 on the Arena leaderboard with an Elo score of 1452. They use Apache 2.0 license, PLE/hybrid attention, and TurboQuant for efficiency. The models outperform those 10x larger on efficiency charts and support trimodal capabilities with 256k context and tool calling.
What license does Gemma 4 use?
Gemma 4 models are released under the Apache 2.0 license, allowing broad open-source usage. This has contributed to it being #1 in downloads on Hugging Face.
How does Gemma 4 perform compared to larger models?
Gemma 4 outperforms models over 10x its size on efficiency charts, as noted in posts from Demis Hassabis and Thomas Kipf. The 31B model is up to 2.7x faster on RTX using llama.cpp.
What hardware and quantization support does Gemma 4 have?
It supports UnslothAI and MLX Dynamic Quants, with performance benchmarks on Jetson Orin Nano, RTX 3090, and NVIDIA DGX Spark. The 26B MoE variant runs at 162 tokens/second on RTX.
What are the multimodal capabilities of Gemma 4?
Gemma 4 is trimodal, supporting vision, language, and other modalities with 256k context length and tool calling. Related works include ViGoR, Vision2Web, AIRS, and diagnostic accuracy in epileptic seizure videos using multimodal LLMs.
Where can I find Gemma 4 models?
Gemma 4 models are available on Hugging Face, where they lead in downloads. Videos and articles detail their release built off Gemini 3.
What optimizations improve Gemma 4's speed?
Techniques like llama.cpp yield up to 2.7x speed on RTX for the 31B model, thanks to NVIDIA and Google Gemma efforts. UnslothAI uploads MLX Dynamic Quants for Apple devices.
How does Gemma 4 rank on leaderboards?
The 31B model ranks #3 on the Arena AI text leaderboard with Elo 1452, positioning it as Google's most powerful open AI.
Gemma 4 family (2B/4B edge, 26B MoE 162t/s RTX, 31B Arena #3 Elo 1452) Apache 2.0 PLE/hybrid attn/TurboQuant outperforms 10x larger on eff charts; HF #1 downloads, UnslothAI MLX Dynamic Quants support; trimodal 256k ctx tool calling amid ViGoR/Vision2Web/AIRS.