Theory-driven looks at AI capability, failure modes, and alignment
Probing AI’s Learning Limits
This cluster gathers theory and evaluation work dissecting how modern AI systems learn, reason, and fail. Papers probe hallucination reduction via entropy-aware decoding, amplification effects in test-time RL, optimal early-stopping for chain-of-thought, and saturation points where generative models stop meaningfully improving. Others analyze structural limits of message-passing networks, statistical learning in diffusion models, and why compression makes LMs value internal consistency over truth, alongside cognitive-science critiques of whether current systems truly ‘learn’ and whether alignment conflicts are even solvable. Together, they map the boundaries of today’s architectures and offer tools and concepts for safer, more reliably evaluated AI.