Reproducibility gap in AI-assisted mathematics [developing]
Key Questions
What is the reproducibility gap in AI-assisted mathematics?
The reproducibility gap refers to challenges in consistently replicating results with AI models in math tasks, as seen with Claude on Knuth Hamiltonian problems, GPT achieving 50% on FrontierMath, and surges in Leanstral performance. This gap highlights variability in AI math reasoning across models and benchmarks.
What is Axplorer and its role in AI math discovery?
Axplorer is a tool that democratizes AI math discovery by enabling it on workstations, as highlighted in a 2026 alert. It supports explorations like Turán problems, making advanced math AI accessible beyond high-end servers.
How do LLMs perform in contextual math reasoning?
LLMs show significant limitations in contextual math reasoning, with performance drops of 13-34 points on benchmarks like ContextMATH, P2D-former, and QuitoBench. These gaps reveal struggles with maintaining reasoning quality under added context.
What is the live proof sketches benchmark?
The live proof sketches benchmark tests mathematician-level reasoning by requiring proof sketches rather than surface-level answers. It is designed to be sensitive to genuine mathematical understanding in AI systems.
What is Klear-Reasoner?
Klear-Reasoner advances reasoning via gradient-based methods and Chain-of-Thought (CoT), balancing quality and diversity in long CoT datasets for mathematics. It improves LLM capabilities in complex math tasks.
What is Hiroshi Kera's contribution to AI and mathematics?
Hiroshi Kera introduces reverse problem generation for computational algebra AI datasets, expanding mathematics through AI. As an associate professor at Chiba University, his work creates new frontiers for AI-assisted math datasets.
How does DeepMind's VAD-CFR work in poker?
DeepMind's VAD-CFR uses an LLM to rewrite its own game theory algorithms for Multi-Agent Reinforcement Learning in imperfect-information games like poker. It outperformed experts by designing superior algorithms.
What is the Math-O-Mania video on GNNs?
The Math-O-Mania 2026 video 'Connecting the Dots: Math behind GNNs' explains the mathematics of Graph Neural Networks. It provides insights into GNN fundamentals for AI applications.
Claude Knuth Hamiltonian; GPT-50% FrontierMath; Leanstral surges. New: Math-O-Mania GNN math video; live proof sketches benchmark; DeepSeek Prover V2; Axplorer Turán; Klear-Reasoner CoT; Balcan algos; P2D-former/QuitoBench/ContextMATH LLM contextual gaps (13-34pt drops); MATH-IDN; DeepMind VAD-CFR poker; ML ETF; KANs dyn sys; Python-LLM HPC Fortran symbolic prototyping; arXiv Mathematicians in AI essay; Hiroshi Kera reverse problem generation for computational algebra AI datasets.