Agent trust & governance: Liability vacuum/cyberwar $23T/MASK/OpenClaw ban/ClawArena/patient LLMs/RAG 90% fails/junior eng oversight/boiling frog hallucinations/AILeakMonitor/Vanderbilt drugs/Moonbounce/TENEX/HDP/DeepMind/Fading/TBSP/Anthropic Glasswing/Mythos interp/MIRAGE/ISO/COSO/agents gov agents/governance wall/adoption struggles/deepfakes/US non-comp/System 0/Claude Code/Google lies/ethicists/OpenAI Fellowship/Kaggle evals/MLOps/KISA physical
Key Questions
What is the 'boiling frog' effect in AI use?
A new preprint identifies gradual skill erosion from AI reliance, akin to 'boiling the frog'. It was studied in a series of experiments.
What is Anthropic's Project Glasswing?
Glasswing gates access to Claude Mythos for security researchers, identifying thousands of vulnerabilities. It ensures safe limited release.
How do LLM hallucination rates compare to aviation?
LLM hallucinations occur at 4.6%, vastly exceeding aviation safety standards. Google's AI Overviews also show high error rates.
What is AILeakMonitor?
AILeakMonitor launched amid an 81% surge in AI leaks, with 29 million exposed records. It tracks data exposure risks.
What standards is KISA developing?
KISA launched a project for physical AI security standards in manufacturing, health, and mobility. It addresses emerging risks.
What is System 0 in AI safety?
System 0 rethinks AI safety from the input layer with filters to prevent jailbreaks. It enhances robustness at the entry point.
What governance frameworks are evolving for AI?
COSO aligns AI governance with internal controls, while ISO 42001 supports hybrid tools. Agents are even governing other agents.
What is OpenAI's Safety Fellowship?
OpenAI funds a new AI safety program and talent pipeline. It supports independent research on safety and alignment.
Liability gaps/business adoption struggles/stakeholder misalignment/'boiling frog' skill erosion RCT; Claude bans OpenClaw/Claude Code leak/Mythos interp risks amid agent evals; AILeakMonitor launch amid 81% AI leak surge/29M exposed; Vanderbilt AI detects drug safety signals; OpenClaw video gen/ClawArena dyn evals; patient-facing LLM harms; RAG 90% prod fails/Google AI Overviews hallucinations (4.6% rate); US gov 91% non-comp rights-impacting AI; System 0 input filters jailbreaks; junior eng oversight for hybrid teams/productivity/safety; $23T cyber threat vs $1B safety bets; HDP delegation prov; adoption w/o trust gains; deepfakes erode shared reality; colleges scale AI ethicist training 100k+ jobs; governance wall/audit trails/ISO 42001 hybrid tools/COSO internal controls/MLOps workflows/agents governing agents; Anthropic Glasswing gates Mythos; OpenAI Safety Fellowship eval/robustness/Kaggle boundary evals grants; KISA launches physical AI security standards project (manuf/health/mobility).