Frontier Math & Reasoning: OpenAI o3, AlphaProof, Gemini IMO
Key Questions
What score did OpenAI o3 achieve on ARC-AGI?
OpenAI o3 reached 87.5% on ARC-AGI, marking a significant milestone in abstract reasoning. The result demonstrates progress in general-purpose LLM reasoning capabilities.
What mathematical problems were solved by OpenAI/AlphaProof Nexus?
The system disproved the 1946 Erdős unit-distance conjecture published in Annals of Mathematics and solved nine additional Erdős problems plus 44 sequence conjectures. These outcomes highlight advances in automated theorem proving.
How did Gemini Deep Think perform on IMO-level math?
Gemini Deep Think achieved 35 out of 42 gold medals on IMO-level mathematics problems. The performance underscores concrete breakthroughs in formal mathematical reasoning.
OpenAI o3 achieves 87.5% on ARC-AGI, a major reasoning milestone. OpenAI/AlphaProof Nexus disproves 1946 Erdős unit-distance conjecture (Annals of Mathematics) and solves 9 Erdős problems + 44 sequence conjectures. Gemini Deep Think hits 35/42 gold on IMO-level math. These represent concrete breakthroughs in general-purpose LLM theorem proving and abstract reasoning.