AI Agents Frontier Accelerates

Key Questions

What is the current SWE score on ProgramBench?

ProgramBench currently shows 0% SWE performance. This benchmark highlights ongoing challenges in software engineering tasks for AI agents.

What achievement has Zhipu GLM-5V accomplished?

Zhipu GLM-5V has achieved state-of-the-art (SOTA) status in VLA evaluations. It represents a leading multimodal model in vision-language-action capabilities.

What are Sakana Conductor and Fugu known for?

Sakana Conductor and Fugu have topped recent AI agent evaluations. They demonstrate advanced performance in multi-agent orchestration and benchmarks.

What is Meta's CWM contribution to AI agents?

Meta's CWM refers to advancements in world models for agents. It supports better simulation and planning in complex environments.

What recent release is DeepSeek-V4?

DeepSeek-V4 is a new model release accelerating AI agent capabilities. It contributes to the frontier of efficient and powerful agentic systems.

How does HERMES advance driving models?

HERMES is a unified self-driving world model integrating scene understanding and future prediction. It drives advancements in autonomous vehicle AI through [ICCV 2025] research.

What efficiency gains does ROSE infrastructure provide?

ROSE achieves 3x improvements in end-to-end throughput for agentic RL on serving GPUs. It uses cooperative elasticity across model sizes and cluster scales.

What is Apollo's approach to agent optimization?

Apollo from University of Cambridge co-optimizes agent policies with RL and environment configurations. This coordinated method enhances agent performance in dynamic settings.

ProgramBench 0% SWE; Zhipu GLM-5V SOTA VLA; Sakana Conductor/Fugu top evals; Meta CWM; DeepSeek-V4; HERMES driving models; Spot/UniT robots; Apollo co-opt; ROSE infra 3x; Skill1/Zenith orchestration; synthetic data/evals boom.

Sources (34)

Updated May 10, 2026

AI Agents Frontier Accelerates

Key Questions

What is the current SWE score on ProgramBench?

What achievement has Zhipu GLM-5V accomplished?

What are Sakana Conductor and Fugu known for?

What is Meta's CWM contribution to AI agents?

What recent release is DeepSeek-V4?

How does HERMES advance driving models?

What efficiency gains does ROSE infrastructure provide?

What is Apollo's approach to agent optimization?

Q-RAG: How Reinforcement Learning Trains the Retriever, Not the LLM

The Boring Layer Will Decide Whether AI Agents Actually Scale

Is This the End of Hit-and-Miss Search? Meet the Superintelligent AI Agent

Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE (May 2026)

926-CellFluxRL: Virtual Cell Modeling

ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with ...

Agent-environment co-optimization - Apollo - University of Cambridge

SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation

Consistently Simulating Human Personas with Multi Turn Reinforcement Learning

Agentic Reinforcement Learning In Large Language Models

SkillOS: Learning Skill Curation for Self-Evolving Agents

Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

HERMES++

[ICCV 2025] HERMES: A Unified Self-Driving World Model ...

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Claude Opus 4.7, Gemini 3.1 Pro, and Others Score 0% on New SWE Benchmark

AI Coding Benchmarks Suffer Epic Fail: GPT, Claude, Gemini Top ...

ProgramBench: Can Language Models Rebuild Programs from Scratch?

The DeepSeek Breakthrough

@_akhaliq: MolmoAct2 Action Reasoning Models for Real-world Deployment paper: https://t.co/aKO4mzBqBz https:/...

@huggingface reposted: Map2World Generate 3D worlds from any segment map and text. This framework ens...

Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

@_philschmid: Wrote an overview on how agents manage other agents: Four Subagents Patterns in 2026. From simple fu...

DataClaw: A Process-Oriented Agent Benchmark for Exploratory Real ...

Robust Reinforcement Learning Navigation via Procedural Map ...

@Diyi_Yang reposted: ProgramBench is a joint effort across Meta FAIR, Meta TBD, Stanford, Harvard @K...

[PDF] Research Returns as DeepSeek Gains Momentum and Agent Tools ...

@omarsar0: Autodata (from Meta) is an agentic data scientist that builds high-quality training and evaluation d...

@_akhaliq: Web2BigTable A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction...

Fleet-Scale Reinforcement Learning for Generalist Robot Policies

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents (Apr 2026)

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

NVIDIA GTC 2026 Keynote: AI Breakthroughs Revealed