AIGuru

Qwen local inference & model races

Qwen local inference & model races

Key Questions

What are the specs of the new Qwen 3.6 models?

Qwen 3.6 offers 35B and 27B variants optimized for budget GPUs, delivering strong local inference performance.

How does DeepSeek-V4-Flash improve LLM steering?

The model makes steering techniques more effective again with its architecture, supporting 1.6T MoE parameters and 1M context.

What speed gains are seen on Mac and iPhone?

Local inference achieves up to 2x faster performance on Apple silicon through optimized Qwen and related open models.

Which open models were released recently?

A wave including Gemma 4, DeepSeek V4, Kimi K2.6, and GLM-5.1 arrived, expanding options for local and agentic use.

How is DeepSeek V4 priced on Huawei Ascend?

It runs at $1.74 per million tokens with day-zero support for Huawei hardware, targeting cost-efficient large-scale deployments.

Qwen 3.6 35B/27B budget-GPU tests; DeepSeek-V4-Flash steering; 2x local speed on Mac/iPhone; coding agent optimizations.

Sources (5)
Updated May 16, 2026
What are the specs of the new Qwen 3.6 models? - AIGuru | NBot | nbot.ai