Google Gemma 4 Official Edge Multimodal Launch

Key Questions

What is Google Gemma 4?

Gemma 4 is a 26B MoE model with 25 tokens/second speed, officially released with a San Francisco demo event on April 17. It supports multimodal capabilities and is optimized for edge deployments.

Can Gemma 4 run natively on iPhone?

Yes, Gemma 4 supports native offline inference on iPhone via the Edge Gallery app. This enables mobile and privacy-focused SaaS applications.

What deployment options does Gemma 4 support?

It is optimized for low-cost edge deploys with wrappers on HF, Replicate, vLLM, and Ollama for B2C/B2B use. This fits into the open multimodal model surge.

Gemma 4 (26B MoE 25 tok/s) officially released w/SF demo event April 17, now native offline iPhone inference via Edge Gallery app for mobile/privacy SaaS; optimized for edge deploys enabling low-cost HF/Replicate vLLM/Ollama B2C/B2B wrappers amid open multimodal surge.

Sources (2)

Updated Apr 16, 2026

AI API Commercializer

Google Gemma 4 Official Edge Multimodal Launch

Key Questions

What is Google Gemma 4?

Can Gemma 4 run natively on iPhone?

What deployment options does Gemma 4 support?

Google's Gemma 4 isn't the smartest local LLM I've run, but it's the one I reach for most

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference