Qwen 3.6/3.7 series local deployment gains traction with real-world tests, MTP support, and official quantized checkpoints; new AgentGym2 benchmark highlights real-world agent challenges

Key Questions

What are the key specs of the Qwen 3.6 series models?

Qwen 3.6 includes a 27B dense model and 35B-A3B MoE variant optimized for 32-64GB VRAM setups. They achieve strong LiveCodeBench scores of 83.9% and 80.4% respectively.

How do these models perform in local deployments?

Real-world tests on RTX 3090 and 5090 show MoE speed advantages and MTP boosts in TPS. Official quantized checkpoints and LLMOS tensor offloading enable running the 35B MoE on a single 16GB GPU.

What does the AgentGym2 benchmark reveal?

AgentGym2 shows that even frontier models struggle with realistic agent tasks, while open-source models like Nex-N1-32B demonstrate strong post-training gains in de-idealized environments.

Which Qwen model offers the best intelligence density?

The 35B-A3B MoE variant leads in intelligence density per parameter according to analysis of 71 LLMs. It fits well in the VRAM sweet spot for local coding agents alongside the 80B/3B Qwen3-Coder-Next.

What hardware is recommended for Qwen 3.6/3.7 local use?

Models target 32-64GB VRAM systems, with the 27B version competing with Opus 4.8 on RTX 5090. New techniques like LLMOS further lower the minimum to 16GB via offloading.

Qwen 3.6 (27B dense, 35B-A3B MoE) optimized for 32-64GB VRAM. Real-world cloth simulation on RTX 3090 shows MoE speed wins; MTP boosts TPS. LiveCodeBench: 27B 83.9%, 35B-A3B 80.4%. User confirms 27B on RTX 5090 competes with Opus 4.8. Official quantized checkpoints released. Intelligence density analysis shows 35B-A3B leads density per parameter. AgentGym2 benchmark shows even frontier models struggle with realistic agent tasks; open-source Nex-N1-32B shows strong post-training gains. A recent listicle confirms Qwen3-Coder-Next (80B/3B active, 46GB min) and 30B-A3B fit the VRAM sweet spot. New LLMOS technique enables running 35B MoE on single 16GB GPU via tensor offloading, practical for local deployment.

Sources (6)