Chollet ARC-AGI-3: Unsaturated Agentic AI Benchmark [developing] [developing]

Key Questions

What is the performance of frontier models on ARC-AGI-3?

Frontier models score under 1% on ARC-AGI-3, an unsaturated agentic AI benchmark. It tests adaptive skills and planning. François Chollet critiques curve-fitting vs. symbolic approaches.

How does Chollet view past AGI progress signals?

Chollet sees past jumps as potential AGI signals but emphasizes ARC-AGI-3's challenges. The benchmark remains unsaturated. It contrasts scaling with true generalization.

What flaw does Microsoft’s Universal Verifier expose?

Microsoft’s paper reveals hidden problems in agent benchmarks, like verifying agent actions. It questions eval reliability. Every benchmark shares this issue.

ARC-AGI-3 <1% frontiers; Chollet curve-fitting vs symbolic critique; adaptive skills/planning; past jumps as AGI signals; Microsoft Universal Verifier exposes agent eval flaws.

Sources (2)

Updated Apr 9, 2026

AI Frontier Digest

Chollet ARC-AGI-3: Unsaturated Agentic AI Benchmark [developing] [developing]

Key Questions

What is the performance of frontier models on ARC-AGI-3?

How does Chollet view past AGI progress signals?

What flaw does Microsoft’s Universal Verifier expose?

@omarsar0: NEW paper from Microsoft Every agent benchmark has the same hidden problem: how do you know the age...

@fchollet: With curve-fitting, you are recording a lossy approximation of the output of some generative program...

**Chollet ARC-AGI-3: Unsaturated Agentic AI Benchmark [developing]** [developing]

Key Questions

What is the performance of frontier models on ARC-AGI-3?

How does Chollet view past AGI progress signals?

What flaw does Microsoft’s Universal Verifier expose?

@omarsar0: NEW paper from Microsoft Every agent benchmark has the same hidden problem: how do you know the age...

@fchollet: With curve-fitting, you are recording a lossy approximation of the output of some generative program...

Chollet ARC-AGI-3: Unsaturated Agentic AI Benchmark [developing] [developing]