BrokenArXiv: LLMs fail to reliably reject false math proofs
Key Questions
What is the BrokenArXiv benchmark?
BrokenArXiv is a benchmark that tests LLMs' ability to reject perturbed or false math proofs. State-of-the-art LLMs accept around 60% of these false proofs, revealing weaknesses in verification and generalization. It highlights issues in base LLMs, as noted by Chollet and Marcus.
Why do LLMs struggle with math proof verification?
LLMs fail due to poor generalization on math tasks without test-time adaptation, accepting flawed proofs. The benchmark exposes verification weaknesses, connected to hallucinations in AI-written papers. Priorities include code, independent reproductions, and formal proof-checking tools.
What is Paper Reconstruction Evaluation?
Paper Reconstruction Evaluation detects presentation flaws and hallucinations in AI-written papers. It evaluates how well AI-generated content can be reconstructed accurately. This tool addresses gaps in AI paper reliability.
What advancements are mentioned like Cog-DRIFT and TriAttention?
Cog-DRIFT enables models to learn from zero-reward examples in RLVR, breaking exploration barriers. TriAttention improves efficient long reasoning with trigonometric KV compression. These tie into broader reasoning improvements amid verification challenges.
How does BrokenArXiv relate to RLHF jailbreaks and agentic pressure?
BrokenArXiv's findings connect to RLHF jailbreaks, where models fail under pressure, similar to accepting false proofs. It underscores agentic vulnerabilities in reasoning tasks. Tracking reproductions and code is emphasized for further study.
Benchmark shows SOTA LLMs accept ~60% perturbed/false math proofs, exposing verification weaknesses; Chollet/Marcus highlight base LLMs flop on generalization math; Paper Reconstruction Eval added for detecting hallucinations/presentation flaws in AI-written papers; new Cog-DRIFT RLVR advances and TriAttention long reasoning tie in. Connected to RLHF jailbreaks and agentic pressure; priorities: code, independent repros, formal proof-checking tools.