Agent velocity: Cursor 3, GLM-5V/GLM-5.1, Claude MS365/Managed Agents, HF traces, GEN-1/Poke, Atlassian agents [developing]
Key Questions
What is Cursor 3's performance in agent benchmarks?
Cursor 3 achieves 61.7% on Terminal2 benchmarks. It contributes to agent velocity alongside tools like GLM-5.1 at 58.4% on SWE-Pro for 8hr tasks.
How does GLM-5.1 perform on long-horizon tasks?
GLM-5.1 is Z.Ai's flagship for long-horizon tasks, enabling continuous autonomous work. It scores 58.4% on SWE-Pro benchmarks. Related to agentic developments like Self-Exec and SkillClaw.
What are Claude's new agent features?
Claude v2.1.88 includes Managed Agents beta, MS365 integration, and Skills. It compares to Gemini 2026 with 82.1% vs 63.8% on SWE-bench. Atlassian integrates similar agents in Confluence Remix.
What is HF traces and its role in agents?
HF traces support open-source frontier agents via datasets, as pushed by Hugging Face. Ties into benchmarks like Agentic-MME, Xpertbench, ClawsBench, KnowU-Bench, and ClawBench.
What are recent funding and products in agentic space?
Poke (OpenClaw for normies) raised $10M, Sentra $5M. GEN-1 from Generalist achieves 99% with 3x speed in robotics. Unsloth, Self-Execution Simulation, and SkillClaw evolve agent skills.
What benchmarks evaluate agent skills realistically?
Benchmarks like How Well Do Agentic Skills Work in the Wild test LLM skill usage. KnowU-Bench focuses on interactive, proactive mobile agents. ClawsBench and others assess real-world performance.
What infrastructure supports agent development?
MMX-CLI from MiniMax is built for agents, not humans. AI Gateway offers no downtime, lock-in, or keys. Copilot DRACO/SLMs and LLM Wiki aid development.
What is Anthropic's OpenClaw update?
Anthropic added a paywall to OpenClaw for AI model evaluation. Poke simplifies it for users. Atlassian launches visual AI tools and third-party agents in Confluence.
Cursor 3 (61.7% Terminal2); GLM-5.1 58.4% SWE-Pro/8hr tasks; Claude v2.1.88 + Managed Agents beta/MS365/Skills; HF traces; LLM Wiki; Copilot DRACO/SLMs; GEN-1 99%/3x faster; Poke OpenClaw ($10M)/Sentra ($5M); Atlassian Confluence Remix/agents; Unsloth/Self-Exec/SkillClaw; Agentic-MME/Xpertbench/ClawsBench/KnowU-Bench/ClawBench.