New tools to evaluate, harden, and govern powerful AI agents
Securing the Age of AI Agents
This cluster tracks a rapid build‑out of evaluation, control, and security infrastructure around increasingly autonomous AI agents. Research and industry posts highlight new agent evaluation systems (One‑Eval, malware-style analysis, Code-A1), safety frameworks for single and multi-agent setups (TrinityGuard, NemoClaw/OpenClaw, TrendAI/OpenShell), and models focused on verifiable reasoning like MiroThinker. Alongside this, governance and testing are professionalizing, with Promptfoo’s acquisition by OpenAI, work on constitutions and bias/game‑theoretic checks, and community efforts like the AI Control Hackathon, all aiming to prevent misaligned or catastrophic agent behavior as deployment scales.