Google Gemma 4 open-source multimodal models [developing]

Key Questions

What is the Gemma 4 model family?

Gemma 4 is a family of open-source multimodal models from Google, including 2B/4B edge models, a 26B Mixture-of-Experts (MoE) model achieving 162 tokens/second on RTX, and a 31B model ranking #3 on the Arena leaderboard with an Elo score of 1452. They use Apache 2.0 license, PLE/hybrid attention, and TurboQuant for efficiency. The models outperform those 10x larger on efficiency charts and support trimodal capabilities with 256k context and tool calling.

What license does Gemma 4 use?

Gemma 4 models are released under the Apache 2.0 license, allowing broad open-source usage. This has contributed to it being #1 in downloads on Hugging Face.

How does Gemma 4 perform compared to larger models?

Gemma 4 outperforms models over 10x its size on efficiency charts, as noted in posts from Demis Hassabis and Thomas Kipf. The 31B model is up to 2.7x faster on RTX using llama.cpp.

What hardware and quantization support does Gemma 4 have?

It supports UnslothAI and MLX Dynamic Quants, with performance benchmarks on Jetson Orin Nano, RTX 3090, and NVIDIA DGX Spark. The 26B MoE variant runs at 162 tokens/second on RTX.

What are the multimodal capabilities of Gemma 4?

Gemma 4 is trimodal, supporting vision, language, and other modalities with 256k context length and tool calling. Related works include ViGoR, Vision2Web, AIRS, and diagnostic accuracy in epileptic seizure videos using multimodal LLMs.

Where can I find Gemma 4 models?

Gemma 4 models are available on Hugging Face, where they lead in downloads. Videos and articles detail their release built off Gemini 3.

What optimizations improve Gemma 4's speed?

Techniques like llama.cpp yield up to 2.7x speed on RTX for the 31B model, thanks to NVIDIA and Google Gemma efforts. UnslothAI uploads MLX Dynamic Quants for Apple devices.

How does Gemma 4 rank on leaderboards?

The 31B model ranks #3 on the Arena AI text leaderboard with Elo 1452, positioning it as Google's most powerful open AI.

Gemma 4 family (2B/4B edge, 26B MoE 162t/s RTX, 31B Arena #3 Elo 1452) Apache 2.0 PLE/hybrid attn/TurboQuant outperforms 10x larger on eff charts; HF #1 downloads, UnslothAI MLX Dynamic Quants support; trimodal 256k ctx tool calling amid ViGoR/Vision2Web/AIRS.

Sources (10)

Updated Apr 8, 2026

Generative AI Pulse

Google Gemma 4 open-source multimodal models [developing]

Key Questions

What is the Gemma 4 model family?

What license does Gemma 4 use?

How does Gemma 4 perform compared to larger models?

What hardware and quantization support does Gemma 4 have?

What are the multimodal capabilities of Gemma 4?

Where can I find Gemma 4 models?

What optimizations improve Gemma 4's speed?

How does Gemma 4 rank on leaderboards?

Diagnostic accuracy of multimodal large language models in differentiating epileptic from functional seizures in smartphone recorded videos | Scientific Reports

@huggingface: RT @ivanfioravanti: Wait, what? @UnslothAI is starting to upload MLX Dynamic Quants! I have to test ...

CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning

Gemma 4 is Here: Google’s Most Powerful Open AI Ever!

Google's Gemma 4 Goes Open Source, Microsoft Challenges AI Rivals

Google releases Gemma 4, a family of open models built off of Gemini 3

Gemma 4 Performance Showdown on Real Devices: Jetson Orin Nano vs RTX 3090 vs NVIDIA DGX Spark

Google Releases Gemma 4, Free AI Models for Any Device

@huggingface reposted: .@GoogleGemma 4 31B is up to 2.7X faster on RTX using llama.cpp. Thanks to @gg...

@tkipf reposted: Gemma 4 outperforms models over 10x their size! (note the x-axis is log scale!) ...

**Google Gemma 4 open-source multimodal models [developing]**

Key Questions

What is the Gemma 4 model family?

What license does Gemma 4 use?

How does Gemma 4 perform compared to larger models?

What hardware and quantization support does Gemma 4 have?

What are the multimodal capabilities of Gemma 4?

Where can I find Gemma 4 models?

What optimizations improve Gemma 4's speed?

How does Gemma 4 rank on leaderboards?

Diagnostic accuracy of multimodal large language models in differentiating epileptic from functional seizures in smartphone recorded videos | Scientific Reports

@huggingface: RT @ivanfioravanti: Wait, what? @UnslothAI is starting to upload MLX Dynamic Quants! I have to test ...

CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning

Gemma 4 is Here: Google’s Most Powerful Open AI Ever!

Google's Gemma 4 Goes Open Source, Microsoft Challenges AI Rivals

Google releases Gemma 4, a family of open models built off of Gemini 3

Gemma 4 Performance Showdown on Real Devices: Jetson Orin Nano vs RTX 3090 vs NVIDIA DGX Spark

Google Releases Gemma 4, Free AI Models for Any Device

@huggingface reposted: .@GoogleGemma 4 31B is up to 2.7X faster on RTX using llama.cpp. Thanks to @gg...

@tkipf reposted: Gemma 4 outperforms models over 10x their size! (note the x-axis is log scale!) ...

Google Gemma 4 open-source multimodal models [developing]