********SV Stacks Quietly Running on Chinese OSS AI: GLM-5.1 SWE-Bench #1, Kimi K2.6 Full Rollout, MiniMax M2.7, Cognition Infiltration********
Key Questions
What are the top performances of GLM-5.1?
GLM-5.1 ranks #2 on N-Day at 80.13%, #1 on SWE-Bench Pro at 58.4, and 68.7 on CyberGym. It's used by Cognition in SWE-1.6.
What is Kimi K2.5/K2.6?
Kimi K2.6-code-preview is a CLI model (curl/subagents/debug) now available, following K2.5. It's integrated into Cursor for code workflows.
How does MiniMax M2.7 compare?
MiniMax M2.7 scores 56% on benchmarks, matching Codex levels. It contributes to Chinese OSS AI infiltration in SV stacks.
What is N-Day-Bench?
N-Day-Bench tests LLMs on finding real vulnerabilities in codebases, where Chinese models like GLM-5.1 excel.
What business impacts are seen?
Shopify runs $5M agents on these models, highlighting SV quietly using Chinese OSS AI despite bans.
GLM-5.1 N-Day #2 80.13%/SWE-Bench Pro #1 (58.4), CyberGym 68.7; MiniMax M2.7 (56%=Codex); Cursor Kimi K2.5/K2.6-code-preview CLI full rollout to testers (curl/subagents/debug $15-199/mo); Cognition SWE-1.6 GLM; Shopify $5M agents.