Agent memory trustworthiness and interpretability gaps
Developing with new signal: Memory-Controlled Benchmark for LLM Trading Agents (data leakage masking, finding that LLM returns are mostly passive exposure). Previous: ResearchMath-14K (agentic math dataset, newer models generate 5x more fake references – hallucination scaling); Hermes overtaking OpenClaw highlights OpenClaw security vulnerabilities (9.9 severity, 341 malicious skills); OpenClaw deep dive; Preset data agent; SAM; Confidence and Calibration of Activation Oracles; OpenViking; Cognee/OpenClaw/MCP orchestration; long-horizon Kimi K2.6; Skill consumption study; durable agent state (SkillOpt, MemAudit); Google Agent Executor; event-driven architecture blueprint.
Sources (2)
Updated May 28, 2026