Alignment Talent Flux at Labs

Key Questions

Who is leading Anthropic's alignment team?

Jan Leike is leading Anthropic's alignment team after leaving OpenAI. This reflects talent flux in AI safety research.

What notable achievement did Claude Haiku 4.5 accomplish?

Claude Haiku 4.5 achieved a perfect score on agentic evaluations. It addresses flaws in RLHF and tool use for better alignment.

What potential collaboration is mentioned between xAI and Anthropic?

An xAI-Anthropic deal may commercialize AI safety technologies. This could broaden access to alignment tools and methods.

What is Anthropic's Petri tool?

Petri is an open-source AI alignment tool donated by Anthropic. It supports advanced alignment research and evaluations.

Why is high accuracy like 99.9% insufficient for AI alignment?

Even 99.9% accuracy can lead to catastrophic failures in superhuman AI systems. Alignment requires addressing issues like the 'memory curse' in LLM agents and superhuman logic bounds.

Jan Leike leads Anthropic alignment team post-OpenAI; Claude Haiku 4.5 perfect agentic eval score fixes RLHF/tool flaws. xAI-Anthropic deal may commercialize safety.

Sources (4)

Updated May 12, 2026

AI Repo & Hardness

Alignment Talent Flux at Labs

Key Questions

Who is leading Anthropic's alignment team?

What notable achievement did Claude Haiku 4.5 accomplish?

What potential collaboration is mentioned between xAI and Anthropic?

What is Anthropic's Petri tool?

Why is high accuracy like 99.9% insufficient for AI alignment?

AI Alignment: Why 99.9% Accuracy Isn't Enough

Superhuman Alignment: Training Models Beyond Our Logic

@omarsar0: // The Memory Curse in LLM Agents // (bookmark it) Long histories apparently degrades agents as th...

Anthropic Just Donated Petri: The Open-Source AI Alignment Tool