LLM Benchmark Watch

GLM-5.1/Zhipu + Qwen3.6/Omni + MiniMax/Kimi cheap OSS crushing Claude/GPT

GLM-5.1/Zhipu + Qwen3.6/Omni + MiniMax/Kimi cheap OSS crushing Claude/GPT

Key Questions

What are the key features of GLM-5.1?

GLM-5.1 is a 754B OSS MoE model from Zhipu that tops open-source and ranks #3 globally on SWE-Bench Pro, Terminal-Bench, beating Opus 4.6 and GPT-5.4. It supports 6x VectorDBBench and KernelBench performance.

How does Qwen 3.6 perform?

Qwen 3.6 Plus/Omni offers 1M context and agentic capabilities surpassing Claude with 78.8% on SWE-Verified. Plans exist to open-source medium-sized versions.

What about MiniMax and Kimi models?

MiniMax 2.7 and Kimi 2.5 deliver 75-80% performance at 1/10th the cost, exploding in usage and topping VIBE evals.

What benchmarks highlight these models?

Shootouts include SWE, ARC-AGI3, DRACO, and Arena42, where GLM-5.1 and Qwen 3.6 crush Claude and GPT.

Is GLM-5.1 available openly?

GLM-5.1 is released on Hugging Face under MIT license, available immediately for use.

How does Qwen 3.6 compare to top models?

Qwen 3.6 Plus handled all Opus tasks effectively over 90M tokens and beats GPT-5.4/Claude Opus 4.6 in comparisons.

What vision capabilities does GLM have?

GLM-5V-Turbo is a vision coding model with insane updates for multimodal tasks.

What are recent updates for these models?

Top LLM updates from April Week 1, 2026, cover Qwen 3.6, GLM-5.1, and agentic advancements in RAG and benchmarks.

GLM-5.1 754B OSS MoE beats Opus4.6/GPT-5.4 SWE-Pro/VectorDBBench 6x/KernelBench; Qwen3.6 Plus/Omni 1M ctx agentic>Claude/78.8% SWE-Verified; MiniMax 2.7/Kimi 2.5 75-80% perf/1/10th cost exploding usage; VIBE MiniMax top; shootouts SWE/ARC-AGI3/DRACO/Arena42.

Sources (18)
Updated Apr 8, 2026
What are the key features of GLM-5.1? - LLM Benchmark Watch | NBot | nbot.ai