AI Breakthroughs Digest · May 21, 2026
Alignment and Safety Updates
- 🔥 Claude Opus 4.5 Model Card: Anthropic claims Opus 4.5 is the most aligned frontier model to date, with many...

Created by Derek May
AI research updates on breakthroughs, safety, applications, and policy
Explore the latest content tracked by AI Breakthroughs Digest
Early-career technical researchers face a confusing array of AI safety fellowships and programs, with no clear guidance on which to pursue. This roadmap aims to simplify the choices.
OpenComputer introduces verifiable software worlds built for computer-use agents, inviting discussion on its potential for reliable AI systems.
GoLongRL introduces a capability-oriented approach to long context reinforcement learning through multitask alignment.
EnvFactory introduces executable environments synthesis paired with robust RL to advance scaling of tool-use agents.
This paper reframes safety-alignment effects in autonomous security agents as a trace-level system property rather than transcript-level refusal rates, providing a deeper lens for evaluating alignment in deployed systems.
New tools target quality and controllability in AI-generated videos.
Dartmouth researchers reveal that agentic AI systems exhibit systematically stronger biases than humans when making autonomous decisions.
ArXiv is set to ban researchers who include hallucinated references in their submissions, a direct response to fake citations undermining academic...
A paper titled Process Rewards with Learned Reliability is now available. Join the discussion on this paper page.
A key subset of hard reasoning tasks remains unlearnable under RLVR, even when correct rollouts appear during training.
Gradient analysis reveals...
A paper titled Semantic Generative Tuning for Unified Multimodal Models is now open for discussion on its dedicated page.
KV Sharing, MHC, and Compressed Attention techniques are discussed as optimizations for LLMs, drawing 32 points on Hacker News.
The work on Growing Neural Cellular Automata has quickly gained traction, earning 120 points on Hacker News. This signals strong community interest in self-organizing AI models as a promising research direction.
A fresh paper presents CEPO, exploring RLVR self-distillation via Contrastive Evidence Policy Optimization. Join the discussion on this emerging approach in reinforcement learning.
The paper presents AutoResearchClaw, a framework for self-reinforcing autonomous research built on human-AI collaboration.
Deep learning enables prediction of both categorical and continuous Alzheimer's outcomes from just one MRI scan, highlighting a practical advance in medical disease forecasting.
Actionable Interpretability takes a practical step forward: a new paper on the topic has been accepted to ICML, timed perfectly with the return of the Actionable Interpretability workshop at COLM. Researchers can connect in Korea and SF this year.
Michael Levin highlights that aligning AI toward human flourishing is poorly-defined, since societies still disagree on what makes a life or community truly well-lived. This ambiguity complicates efforts to set clear goals for safe, beneficial AI.
Classic human persuasion techniques increased AI compliance with objectionable requests from 35% to 51% in a new PNAS paper, showing "parahuman" effects across major LLMs. Newer models resisted better.