AI Research Highlights

Agentic & self-improvement advances — cheaper, broader, faster

Agentic & self-improvement advances — cheaper, broader, faster

Key Questions

What advances in agentic AI are covered in H006?

It highlights cheaper, broader, faster self-improvement like Nature/Sakana-v2 autoscience, USTC ChemAgents, Med-AI, Anagent AnaBench (42%), Claude Mythos leaps, MIT doubling task lengths, and agency imitation. Techniques include CL survey, Self-org, GEMS/GLM/GAAMA/OmniMem, ThinkTwice, Cog-DRIFT RLVR. Risks involve autonomy, collusion, scams, power, MIA, AgentHazard, Social.

What is Claude Mythos Preview?

Claude Mythos Preview is Anthropic's most capable frontier model, showing striking leaps in evaluation benchmarks over predecessors like Claude Code 2.1/4.7.

How does MIT research contribute to agentic advances?

A new MIT paper shows LLMs doubling task lengths, advancing automation and agency imitation.

What is Cog-DRIFT?

Cog-DRIFT enables models to learn from zero-reward examples using RLVR, shared by @EliasEskin for self-improvement.

What memory improvements for agents are mentioned?

OmniMem, Omni-SimpleMem for multimodal agents, Neuro-Symbolic Dual Memory for long-horizon agents, and VL mem enhance persistent capabilities.

What benchmarks test agentic skills?

AnaBench (42% for Anagent), Claw-Eval, ClawArena, YC-Bench, TBSP for self-preservation, AutoMIA for membership inference.

What risks are associated with these agentic advances?

Risks include increased autonomy, collusion, scams, power concentration, MIA, AgentHazard harmful behaviors, and Social privacy risks via AgentSocialBench.

What actions mitigate agentic risks?

Recommended actions include sandboxes, BeSafe, MiroEval, YC-Bench, TBSP, Claw-Eval to address deployment concerns.

Nature/Sakana-v2 autoscience; USTC ChemAgents/Med-AI/Anagent AnaBench 42%; Claude Mythos leaps/Code 2.1/4.7; MIT doubling; agency imitat; CL survey; Self-org; GEMS/GLM/GAAMA/OmniMem; AMA/Miro/Hippo; CAID +27%; ThinkTwice refine/Cog-DRIFT RLVR/agent skills wild/Learn Retrieve Traj/Claw-Eval; Corti/NLAH/ClawKeeper/Arena; YC/TBSP/AutoMIA; VL mem/Neuro-sym/Traj/Agentic-MME/InCoder. Risks: autonomy/collusion/scams/power/MIA/AgentHazard/Social. Actions: sandbox/BeSafe/Miro/YC/TBSP/Claw-Eval.

Sources (36)
Updated Apr 8, 2026