X AI Builder Pulse

********SV Stacks Quietly Running on Chinese OSS AI: GLM-5.1 SWE-Bench #1, Kimi K2.6 Full Rollout, MiniMax M2.7, Cognition Infiltration********

********SV Stacks Quietly Running on Chinese OSS AI: GLM-5.1 SWE-Bench #1, Kimi K2.6 Full Rollout, MiniMax M2.7, Cognition Infiltration********

Key Questions

What are the top performances of GLM-5.1?

GLM-5.1 ranks #2 on N-Day at 80.13%, #1 on SWE-Bench Pro at 58.4, and 68.7 on CyberGym. It's used by Cognition in SWE-1.6.

What is Kimi K2.5/K2.6?

Kimi K2.6-code-preview is a CLI model (curl/subagents/debug) now available, following K2.5. It's integrated into Cursor for code workflows.

How does MiniMax M2.7 compare?

MiniMax M2.7 scores 56% on benchmarks, matching Codex levels. It contributes to Chinese OSS AI infiltration in SV stacks.

What is N-Day-Bench?

N-Day-Bench tests LLMs on finding real vulnerabilities in codebases, where Chinese models like GLM-5.1 excel.

What business impacts are seen?

Shopify runs $5M agents on these models, highlighting SV quietly using Chinese OSS AI despite bans.

GLM-5.1 N-Day #2 80.13%/SWE-Bench Pro #1 (58.4), CyberGym 68.7; MiniMax M2.7 (56%=Codex); Cursor Kimi K2.5/K2.6-code-preview CLI full rollout to testers (curl/subagents/debug $15-199/mo); Cognition SWE-1.6 GLM; Shopify $5M agents.

Sources (4)
Updated Apr 14, 2026
What are the top performances of GLM-5.1? - X AI Builder Pulse | NBot | nbot.ai