Agentic Data & Benchmarks: SWE-ZERO-12M/Orchard/MemEye/ARC-AGI-3 [developing]
Key Questions
What is SWE-ZERO-12M and its scale?
SWE-ZERO-12M consists of 112 billion tokens of agentic trajectories for training. It represents a major dataset for developing autonomous coding agents.
How does the Orchard framework support agentic AI?
Orchard is an open-source framework designed to build and orchestrate agentic workflows. It focuses on memory and task execution for LLM-based agents.
What progress has been made on ARC-AGI-3 benchmarks?
ARC-AGI-3 remains below 1% solved, indicating persistent challenges in abstract reasoning. New memory benchmarks like MemEye are emerging to test visual agent capabilities.
SWE-ZERO-12M trajectories (112B tokens); Orchard OSS agentic framework; MemEye pixel visual memory bench; Codex mobile; ARC-AGI-3 <1%; Second Scaling Law inference tokens.
Sources (2)
Updated May 16, 2026