Google Gemma 4 + PrismML Bonsai Edge SLM Explosion

Key Questions

What is Google Gemma 4?

Google Gemma 4 is a family of open AI models launched under the Apache 2.0 license, featuring four sizes: 2B, 4B, 26B Mixture-of-Experts (MoE), and 31B multimodal parameters. It excels in advanced reasoning and outperforms rivals with up to 400B parameters. The models support base and instruction-tuned variants for various applications.

What model sizes are available in Gemma 4?

Gemma 4 includes models with 2B, 4B, 26B MoE, and 31B parameters. These are available as both base and instruction-tuned versions. They are designed for edge deployment and advanced tasks.

What license does Gemma 4 use?

Gemma 4 models are released under the Apache 2.0 license, providing developers more freedom compared to previous proprietary licenses. This allows broad usage in commercial and non-commercial projects. Official docs and playgrounds support mobile and edge deployments.

Which platforms support running Gemma 4?

Gemma 4 runs on Android AI Edge, iOS, Ollama, llama.cpp, MLX, LM Studio, OpenRouter, vLLM, and Unsloth. It is optimized for devices like Jetson, RTX, Mac, Raspberry Pi, and phones. Tools like GGUF quantization enable low-cost edge inference.

How does Gemma 4 perform against larger models?

Gemma 4's 26B MoE and 31B multimodal models beat rivals with 400B parameters in benchmarks. It supports hybrid attention and Rust-based LMs for efficiency. This enables low-cost B2C/B2B agents and SaaS applications.

What tools are available for fine-tuning Gemma 4?

Unsloth provides A4B GGUF and E4B quantization with no-code Studio fine-tuning support. LangChain RAG and TRL integration are available for customization. Nanocode offers JAX on TPUs for efficient training.

Can Gemma 4 run on edge devices like phones?

Yes, Gemma 4 is optimized for edge devices including phones, Raspberry Pi, Jetson, RTX, and Mac. It supports Ollama, llama.cpp, and MLX for local inference. This powers low-cost agents via PrismML Bonsai and Qwen3 SLMs.

What is the significance of the Gemma 4 launch?

The launch marks a shift to powerful edge SLMs, with immediate support across ecosystems like LM Studio and OpenRouter. Celebrations and rapid ports to MLX highlight community excitement. It enables hybrid attention and P2P setups like Dragonfly for scalable AI.

Official Gemma 4 launch (26B MoE/31B multimodal edge, beats 400B rivals) w/Android AI Edge/iOS/Ollama/llama.cpp/MLX/LM Studio/OpenRouter + vLLM/Unsloth A4B GGUF E4B/no-code Studio fine-tune/LangChain RAG/Dragonfly P2P/Hybrid Attention Rust LM; Jetson/RTX/Mac/RPi/phones for low-cost B2C/B2B agents/SaaS w/Bonsai Qwen3 SLMs/TRL/Nanocode TPU.

Sources (26)

Updated Apr 8, 2026

AI API Commercializer

Google Gemma 4 + PrismML Bonsai Edge SLM Explosion

Key Questions

What is Google Gemma 4?

What model sizes are available in Gemma 4?

What license does Gemma 4 use?

Which platforms support running Gemma 4?

How does Gemma 4 perform against larger models?

What tools are available for fine-tuning Gemma 4?

Can Gemma 4 run on edge devices like phones?

What is the significance of the Gemma 4 launch?

Gemma4

Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs

unsloth/gemma-4-26B-A4B-it-GGUF · Hugging Face

@julien_c: To celebrate the Gemma 4 launch we held a small impromptu get together with @yagilb from @lmstudio f...

@LinusEkenstam: This is huge 🚨 We're no longer running, we are sprinting towards a future with edge models doing a ...

Google Launches Gemma 4 Open AI Models Under Apache 2.0 License

Google Unveils the Gemma 4 Open Model Family

Google's new open AI model, Gemma 4, gives developers more freedom

Gemma 4 Explained: What It Is, What It Can Do, And How To Use It Right Now

What Is Google Gemma 4? Architecture, Benchmarks, and Why It Matters | WaveSpeedAI Blog

Google’s Gemma 4 Just Made Cloud AI Optional. | by Borislav Bankov | Apr, 2026 | Medium

I Built a Production Instagram AI Agent Using Claude

Gemma 4 Is HERE – Testing Google’s New 26B & 31B Open Models!

@ClementDelangue reposted: This guy is BEYOND CRACKED. Gemma 4 already on MLX, bro has uploaded all models...

Lemonade: Local AI for Text, Images, and Speech

Google launches open model Gemma 4

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

Bringing AI Closer to the Edge and On-Device with Gemma 4 | NVIDIA Technical Blog

Gemma 4 E4B It - a Hugging Face Space by huggingface-projects

NVIDIA Optimizes Google Gemma 4 for Edge AI Deployment Across Hardware Stack

@ClementDelangue reposted: MASSIVE Gemma 4 (31B, Dense), a model that performs on parity w/ Kimi K2.5 (1.1...

@Scobleizer reposted: Exciting news for Jetson developers 🎉 Gemma 4 is now on Jetson. @GoogleGemma’s ...

@_akhaliq reposted: Conversations tend to go better with a face and a voice. That’s why we’re thrill...

Gemma 4: Byte for byte, the most capable open models

How to Build a Production-Ready Gemma 3 1B Instruct Generation AI Pipeline with Hugging Face Transformers, Chat Templates, and Colab Inference

@ClementDelangue reposted: a new v3 release is out for qwopus3.5 9b, jackrong has been busy fits on 8gb an...