Agentic Data & Benchmarks: SWE-ZERO-12M/Orchard/MemEye/ARC-AGI-3 [developing]

Key Questions

What is SWE-ZERO-12M and its scale?

SWE-ZERO-12M consists of 112 billion tokens of agentic trajectories for training. It represents a major dataset for developing autonomous coding agents.

How does the Orchard framework support agentic AI?

Orchard is an open-source framework designed to build and orchestrate agentic workflows. It focuses on memory and task execution for LLM-based agents.

What progress has been made on ARC-AGI-3 benchmarks?

ARC-AGI-3 remains below 1% solved, indicating persistent challenges in abstract reasoning. New memory benchmarks like MemEye are emerging to test visual agent capabilities.

SWE-ZERO-12M trajectories (112B tokens); Orchard OSS agentic framework; MemEye pixel visual memory bench; Codex mobile; ARC-AGI-3 <1%; Second Scaling Law inference tokens.

Sources (2)

Updated May 16, 2026

AI Frontier Digest

Agentic Data & Benchmarks: SWE-ZERO-12M/Orchard/MemEye/ARC-AGI-3 [developing]

Key Questions

What is SWE-ZERO-12M and its scale?

How does the Orchard framework support agentic AI?

What progress has been made on ARC-AGI-3 benchmarks?

PREPING: Building Agent Memory without Tasks

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents