Self-evolution & memory / runtime learning

Key Questions

What is SkillOpt-Lite and how does it advance agent self-evolution?

SkillOpt-Lite is a minimal ZO optimization pipeline for skill evolution that enables better and faster self-evolution via one line of vibe. It allows nano models to outperform larger full SkillOpt versions and has been integrated into VSCode Copilot, challenging assumptions about complex pipelines.

What scaling law did ByteDance discover for AI agents?

ByteDance identified a post-deployment scaling law where agents double their learning speed every 3 months. This finding could help sustain AI progress beyond current model-size limits.

What performance gains were achieved by SIA and Retrospective Harness Optimization?

SIA delivered a 502% gain while Retrospective Harness Optimization raised SWE-Bench Pro scores from 59% to 78%. These results highlight rapid progress in self-evolving multi-agent systems with decentralized memory.

How does Ornith-1.0 improve agentic coding?

Ornith-1.0 learns its own RL scaffold autonomously, enabling more efficient agentic coding workflows. Related models like Eevee report 37-48% gains through similar self-evolution techniques.

What is DeepIndex and how does it support agent memory?

DeepIndex provides agentic memory capable of handling 118K tokens per query. It is part of broader efforts including ContReAct, HiPER, and DuoMem for continuous reasoning and memory across sessions.

What diagnostic approach addresses reasoning collapse in LLM agent RL?

A mutual-information-based diagnostic combined with reward-variance-aware filtering helps stabilize multi-turn agent training. This method mitigates collapse during reinforcement learning of agents.

Which frameworks support self-evolving multi-agent systems?

APPO, EvoTrainer, Arbor, and decentralized memory architectures enable self-evolving multi-agent setups. They build on earlier work such as MOSS, RubricEM, and SkillOpt.

How does AHE contribute to performance improvements?

AHE improves benchmark results from 69.7 to 77% within the self-evolution and memory highlight. It complements other advances like NVIDIA Polar in runtime learning.

MOSS, RubricEM, SkillOpt, SIA (502% gain), NVIDIA Polar, AHE (69.7→77%). Ornith-1.0 learns own RL scaffold. Eevee (+37-48%). Retrospective Harness Optimization 59%→78% SWE-Bench Pro. APPO, EvoTrainer, Arbor. DeepIndex agentic memory (118K tokens/query). ByteDance post-deployment scaling law — agents double learning speed every 3 months. ContReAct, HiPER, DuoMem. Self-evolving multi-agent via decentralized memory. Understanding Reasoning Collapse in LLM Agent RL — MI-based diagnostic and reward-variance-aware filtering for multi-turn agent training. New: SkillOpt-Lite — minimal ZO optimization pipeline for skill evolution; nano model outperforms larger full SkillOpt; integrated into VSCode Copilot; challenges complex pipeline assumptions.

Sources (16)