Agent alignment, evals, and frontier model risks
Key Questions
What benchmarks are maturing in agent alignment and evaluation?
One-Eval, PostTrainBench, and AgentProcessBench are noted as maturing evaluation tools. They support assessment of agent alignment and frontier model risks.
What concerns are raised by reports on Mythos and GPT-5.5 models?
Politico reports indicate these models can find exploits faster than humans. This raises policy questions around defense and offense capabilities.
Which sources discuss agentic AI developments and risks?
Google I/O 2026 coverage on the Agentic Gemini Era and the Politico article on Mythos/GPT-5.5 are the primary related sources. They highlight both capabilities and regulatory implications.
One-Eval, PostTrainBench, AgentProcessBench maturing. New: Politico reports Mythos/GPT-5.5 models finding exploits faster than humans, raising defense/offense policy concerns.
Sources (2)
Updated May 26, 2026