Agent velocity: Cursor 3, GLM-5V-Turbo, Claude MS365, HF agent traces, Karpathy LLM Wiki, GEN-1/Poke robotics/agents, Qwen3.6 [developing]

Key Questions

What performance did Cursor 3 achieve on Terminal2?

Cursor 3 scored 61.7% on Terminal2. This benchmark highlights its capabilities in agentic tasks. It is part of ongoing developments in agent velocity.

How does GLM-5V perform on Design2Code?

GLM-5V achieved 94.8% on Design2Code. GLM-5.1, a related model, is designed for long-horizon tasks and can work continuously and autonomously. This positions it strongly in coding and design benchmarks.

What integrations are available with Claude v2.1.88?

Claude v2.1.88 integrates with MS365 and Skills for enhanced agentic capabilities. Atlassian has also launched visual AI tools and third-party agents in Confluence, showing broader ecosystem support. These enable better real-world skill usage.

What is the HF crowdsourced agent traces dataset?

Hugging Face released a crowdsourced dataset of agent traces to support open-source frontier agents. As noted by Clement Delangue, it aims to build datasets for advanced agent development. This addresses the need for high-quality training data.

What is Karpathy's LLM Wiki and how does it relate to RAG?

Andrej Karpathy's LLM Wiki serves as an alternative to RAG workflows. It was highlighted as potentially replacing many RAG setups. This tool focuses on efficient knowledge retrieval for LLMs.

What are the key features of Generalist's GEN-1 robotics model?

Generalist's GEN-1 is a highly capable robotic intelligence AI foundation model achieving 99% performance and 3x faster improvisation. It targets embodied robotics intelligence. This marks progress in agentic robotics.

What is Poke and its connection to OpenClaw?

Poke is described as OpenClaw for normies, scaling via iMessage and raising a $10M round. OpenClaw faced a paywall from Anthropic, impacting AI model evaluation. Poke represents accessible agent scaling.

What is the training scale for Qwen3.6?

Qwen3.6 processes 1T tokens per day. It is under development alongside benchmarks like Agentic-MME and Xpertbench. This supports its competitiveness in agentic tasks.

Cursor 3 (61.7% Terminal2); GLM-5V 94.8% Design2Code; Claude v2.1.88 + MS365/Skills; HF crowdsourced agent traces dataset; LLM Wiki RAG alt; Copilot DRACO/SLMs; GEN-1 robotics 99%/3x faster improv; Poke OpenClaw iMessage scaling ($10M round); Qwen3.6 1T tokens/day; Unsloth MLX/Self-Exec; Agentic-MME/Xpertbench.

Sources (34)

Updated Apr 8, 2026

**Agent velocity: Cursor 3, GLM-5V-Turbo, Claude MS365, HF agent traces, Karpathy LLM Wiki, GEN-1/Poke robotics/agents, Qwen3.6 [developing]**

Key Questions

What performance did Cursor 3 achieve on Terminal2?

How does GLM-5V perform on Design2Code?

What integrations are available with Claude v2.1.88?

What is the HF crowdsourced agent traces dataset?

What is Karpathy's LLM Wiki and how does it relate to RAG?

What are the key features of Generalist's GEN-1 robotics model?

What is Poke and its connection to OpenClaw?

What is the training scale for Qwen3.6?

GLM-5.1 - Overview - Z.AI DEVELOPER DOCUMENT

Atlassian launches visual AI tools and third-party agents in Confluence

How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Self-Execution Simulation Improves Coding Models

@Scobleizer reposted: Poke is OpenClaw for normies. First AI product I've seen successfully scale on ...

Generalist releases highly capable GEN-1 robotic intelligence AI foundation model

Anthropic Puts a Price Tag on OpenClaw: What Claude’s New Paywall Means for AI Model Evaluation

@Scobleizer: RT @sharbel: 🚨 Andrej Karpathy just dropped something that could replace a lot of RAG workflows. It...

@ClementDelangue: We keep saying we want open-source frontier agents. Fine. Then let’s build the dataset. @badlogicg...

Self-Distilled RLVR

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

@_akhaliq: Agentic-MME What Agentic Capability Really Brings to Multimodal Intelligence? paper: https://t.co/...

@_akhaliq: Signals Trajectory Sampling and Triage for Agentic Interactions paper: https://t.co/XPfBucLx0i htt...

Anthropic study finds AI uses 'functional emotions' to guide behaviour

@rasbt: Components of a coding agent: a little write-up on the building blocks behind coding agents, from re...

Self-distillation boosts code LLMs & Coding agents: harness beats model - Hacker News (Apr 4, 2026)

DataTalks: 𝐀𝐠𝐞𝐧𝐭 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 — 𝐌𝐞𝐚𝐬𝐮𝐫𝐢𝐧𝐠 𝐀𝐝𝐚𝐩𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐚𝐧𝐝 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐧𝐠 𝐌𝐮𝐥𝐭𝐢-𝐀𝐠𝐞𝐧𝐭 𝐒𝐲𝐬𝐭𝐞𝐦𝐬

Microsoft released 3 new AI models, ramping up competition with its close partner, OpenAI

GLM-5V-Turbo: What Developers Should Know in 2026 | WaveSpeedAI Blog

Claude Opus 4.6 Coding Performance for Less? Testing Z.ai’s GLM-5

GLM-5V-Turbo Just Dropped: 🔥 #GLM5V #Zai #VisionCoding

Cursor 3 introduces AI agents to handle coding tasks: Why AI coding agents are changing software development

@_akhaliq reposted: Vision2Web Evaluating coding agents on 193 real-world tasks across static, inte...

The Invisible AI Stack: LangSmith, Weights & Biases & OpenAI Evals Explained

@ClementDelangue reposted: a new v3 release is out for qwopus3.5 9b, jackrong has been busy fits on 8gb an...

@ClementDelangue reposted: 𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗙𝗮𝗰𝗲 𝗝𝘂𝘀𝘁 𝗗𝗲𝗺𝗼𝗰𝗿𝗮𝘁𝗶𝘀𝗲𝗱 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 TRL v1.0 is out. SFT, reward modelling...

@rubenhassid: How to set up Claude so it never forgets you: Prompts → Projects → Skills (explained in 3 mins) Pr...

@omarsar0: // Unified Inference and Training Framework for Agent Memory // Most memory-augmented agents are bu...

@omarsar0: Most devs think that adding more agents to a planning system should help. The math says otherwise. ...

@MimansaJ reposted: We just shipped the biggest update to Scouts since launch (and yes, we know what...

@DrJimFan: The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source ...

Embarrassingly Simple Self-Distillation Improves Code Generation

MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Terminal Agents Suffice for Enterprise Automation

Agent velocity: Cursor 3, GLM-5V-Turbo, Claude MS365, HF agent traces, Karpathy LLM Wiki, GEN-1/Poke robotics/agents, Qwen3.6 [developing]