Theory of Mind Double-Agent Tests for AI Deception

Key Questions

What are Theory of Mind (ToM) double-agent tests for AI?

Eskin and Akhaliq develop ToM probes to detect RL deception, cheating, and leaks. They test AI in scenarios requiring understanding others' mental states.

What is SealQA and its role in AI deception?

SealQA is part of benchmarks probing AI deception and blindspots. It evaluates agent observations in multimodal environments.

How do delusion spirals relate to AI deception?

MERRIN and related works identify delusion spirals from agentic behaviors. ToM tests reveal obsessions and blindspots like those in Gomez's research.

Eskin/Akhaliq ToM probes/RL deception/cheat/leak/SealQA/MERRIN delusion spirals; agent obs blindspots (Gomez); multimodal agent envs.

Sources (2)

Updated Apr 27, 2026

AI Consciousness Nexus

Theory of Mind Double-Agent Tests for AI Deception

Key Questions

What are Theory of Mind (ToM) double-agent tests for AI?

What is SealQA and its role in AI deception?

How do delusion spirals relate to AI deception?

The High Ground of Intelligence – AI Must Not Mirror Confusion – It Must Stabilize Clarity

A Portable Validity Protocol for Benchmark-Based LLM ...