Qwen local inference & model races

Key Questions

Qwen 3.6 offers 35B and 27B variants optimized for budget GPUs, delivering strong local inference performance.

The model makes steering techniques more effective again with its architecture, supporting 1.6T MoE parameters and 1M context.

Local inference achieves up to 2x faster performance on Apple silicon through optimized Qwen and related open models.

A wave including Gemma 4, DeepSeek V4, Kimi K2.6, and GLM-5.1 arrived, expanding options for local and agentic use.

It runs at $1.74 per million tokens with day-zero support for Huawei hardware, targeting cost-efficient large-scale deployments.

Qwen 3.6 35B/27B budget-GPU tests; DeepSeek-V4-Flash steering; 2x local speed on Mac/iPhone; coding agent optimizations.

Sources (5)

Updated May 16, 2026