AI Repo & Hardness · Jun 11 Daily Digest
Self-Improving Agent Harnesses
- 🔥 Self-Harness: Introduces harnesses that rewrite themselves from run data rather than remaining fixed...

Created by Gino Côté (onigetoc)
Weekly curated list of fast-growing AI repos, theoretical limits and alignment research, and inference tools
Explore the latest content tracked by AI Repo & Hardness
Static task-level skill retrieval falls short for web agents as page states evolve during execution. SGDR introduces state-grounded dynamic retrieval...
New arXiv paper introduces SearchSwarm to tackle finite context limits in agentic LLMs via smart task delegation to subagents.
Two recent papers strengthen the foundations of RL post-training for LLMs by refining trust-region control and credit assignment.
Latent and graph-based memories are unlocking efficient long-context multimodal reasoning.
Agent scaffolds are shifting from static wrappers to learnable artifacts that rewrite themselves based on run outcomes. This turns manual upkeep into compounding gains for long-horizon systems.
Role-Agent turns a single LLM into both agent and environment, using World-In-Agent state-prediction rewards and Agent-In-World failure-driven task retrieval to drive bootstrapped co-evolution and deliver >4% gains on benchmarks.
EEVEE is the first multi-dataset test-time prompt learning framework for LLM agents, deploying a router to cluster heterogeneous inputs and...
Diffusion models can move beyond greedy per-step decisions with Learned Relay Representations (LRRs), enabling them to plan for the future.
Two fresh benchmarks emphasize long-horizon rigor for agents and world models.
RHO enables agents to self-improve their harness using only prior trajectories: it picks challenging tasks, generates rollouts, applies...
Two new arXiv papers highlight complementary techniques for boosting agent performance via skill and trajectory refinement.
Two fresh benchmarks move beyond binary task success to probe deeper agent capabilities.
Memory mechanisms are rapidly evolving from simple agent stores to sophisticated world-model and context-compression systems.
Live GitHub data reveals clear leaders among agent frameworks.
Textual gradient methods for LLM judges face unique challenges when optimizing across multiple criteria simultaneously, unlike numerical multi-task...