LLM Insight Tracker

Agentic AI: Self-Evo RL + Memory/Planning + Evals

Agentic AI: Self-Evo RL + Memory/Planning + Evals

Key Questions

What is PAGER in agentic AI research?

PAGER is one of several frameworks, alongside Solvita and AIRA, advancing self-evolving agent systems through RL and planning.

How does AVSD improve LLM training?

AVSD uses adaptive-view self-distillation to address sparse outcome rewards in on-policy RL for language models.

What is Moss in autonomous agents?

Moss enables self-evolution through source-level rewriting, allowing agents to iteratively improve their own code.

What does π-Bench evaluate?

π-Bench tests proactive personal assistant agents on hidden intents, inter-task dependencies, and cross-session continuity.

What are domain-camouflaged attacks?

These attacks exploit guardrails in multi-agent debate setups, amplifying static injections by up to 9.9x on smaller models.

What is Spreadsheet-RL?

Spreadsheet-RL applies reinforcement learning to improve LLM agents on realistic, complex spreadsheet manipulation tasks.

What gaps remain in agent cognition?

Current systems still show cognition gaps in long-horizon planning, memory consistency, and robustness to adversarial inputs.

Which papers support these agentic advances?

Recent works on RLVR, CEPO, STATE-Bench, and MINTEval provide the technical foundations for the reported progress.

PAGER/Solvita/AIRA; RLVR/CEPO/AVSD/OPSD self-distillation; Moss source-rewriting; Spreadsheet-RL, π-Bench (proactive agents), STATE-Bench; domain-camouflaged attacks (9.9x amplification in debate). Cognition gaps. Climaxing.

Sources (44)
Updated May 23, 2026