XAI, Sentience & Safety

New tools to evaluate, harden, and govern powerful AI agents

New tools to evaluate, harden, and govern powerful AI agents

Securing the Age of AI Agents

This cluster tracks a rapid build‑out of evaluation, control, and security infrastructure around increasingly autonomous AI agents. Research and industry posts highlight new agent evaluation systems (One‑Eval, malware-style analysis, Code-A1), safety frameworks for single and multi-agent setups (TrinityGuard, NemoClaw/OpenClaw, TrendAI/OpenShell), and models focused on verifiable reasoning like MiroThinker. Alongside this, governance and testing are professionalizing, with Promptfoo’s acquisition by OpenAI, work on constitutions and bias/game‑theoretic checks, and community efforts like the AI Control Hackathon, all aiming to prevent misaligned or catastrophic agent behavior as deployment scales.

Sources (17)
Updated Mar 18, 2026