Autoresearch & deterministic evals

Key Questions

What is AutoResearchClaw?

AutoResearchClaw is a self-reinforcing autonomous research system with human-AI collaboration that achieves +54.7% performance gains in research tasks. It focuses on iterative improvement through agentic workflows.

What advances are shown in self-evolving multi-agents?

The highlight covers self-evolving agents like GenEvolve, which uses tool-orchestrated visual experience distillation for image generation, alongside DeepSeek-V4 and RubricEM for deterministic evaluations.

What is ERA in the context of scientific research?

ERA (Empirical Research Assistance) is an AI system that automates writing high-performance scientific software, enabling expert-level empirical code generation for researchers.

How do AIRA-Compose and AIRA-Design work?

AIRA-Compose and AIRA-Design are neural architecture search methods using AI agents to design better neural networks, as explained in related videos and papers on agentic NAS.

What is Video2GUI used for?

Video2GUI synthesizes large-scale interaction trajectories from videos to pretrain generalized GUI agents, improving their ability to handle real-world interface tasks.

What benchmarks evaluate proactive LLM agents?

Benchmarks like π-Bench assess proactive personal assistant agents by incorporating hidden intents, task dependencies, and cross-session continuity in evaluations.

What is Spreadsheet-RL?

Spreadsheet-RL advances LLM agents on spreadsheet tasks through reinforcement learning, showing substantial performance gains in both generation and reasoning over tabular data.

How does GenEvolve enable self-evolving agents?

GenEvolve allows image generation agents to self-evolve by distilling visual experiences through tool orchestration, leading to continuous improvement without human intervention.

Self-evolving multi-agents + DeepSeek-V4 + RubricEM. New: BEAM, SDAR, AIRA-Compose/AIRA-Design NAS; ERA expert software generation; Video2GUI trajectory synthesis; AutoResearchClaw +54.7% gains; GenEvolve self-evolving image agents via visual distillation.

Sources (34)

Updated May 23, 2026

Autoresearch & deterministic evals

Key Questions

What is AutoResearchClaw?

What advances are shown in self-evolving multi-agents?

What is ERA in the context of scientific research?

How do AIRA-Compose and AIRA-Design work?

What is Video2GUI used for?

What benchmarks evaluate proactive LLM agents?

What is Spreadsheet-RL?

How does GenEvolve enable self-evolving agents?

GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration (May 2026)

π-Bench: New Benchmark for Proactive LLM Agents

Spreadsheet-RL: Advancing Large Language Model Agents on ...

Paper page - π-Bench: Evaluating Proactive Personal Assistant Agents ...

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

DR Tulu: Training Open Models for Long-Form Deep Research

AutoResearchClaw: Self-Reinforcing Autonomous Research and Human-AI Collaboration

Daily ArXiv CS Digest — May 19, 2026 #ArXiv #AI #machinelearning #deeplearning #NLP #llm #research

AI system automates coding for scientific research

An AI system to help scientists write expert-level empirical software

Surveying LLM Multi-Agent Systems: The LIFE Framework

Can AI Agents Design Better Neural Networks? AIRA-Compose & AIRA-Design Explained

Self-Evolving Virtual Agents for Enterprise Financial Helpdesks

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

AI for Auto-Research: Roadmap & User Guide

Post-Trained MoE Can Skip Half Experts via Self-Distillation

OdysseyML Unveils Agora-1 Multi-Agent World Model For Real-Time ...

Top 10 AI Research Papers of 2025

@LukeZettlemoyer reposted: MoEs are everywhere, but the design space is confusing: total vs active experts?...

@EliasEskin reposted: 🚨 Check out Agent-BRACE, our new work on belief state modeling for LLM agents in...

Daily ArXiv CS Digest — May 15, 2026 #ArXiv #AI #machinelearning #deeplearning #NLP #llm #research

Paper page - Self-Distilled Agentic Reinforcement Learning

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

FrontierSmith: Scaling Open-Ended Coding for LLMs

Deep Research on a Loop: Using AI Agents to Construct ...

Pi-Serini: Powerful Lexical Retrieval for LLM Agents

6 MUST-READ LLM Research Papers of 2026 (Google, ByteDance & More)

@_akhaliq reposted: Can fast generative models still be likelihood-based? Excited to share our new ...

The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes (May 2026)

SDAR: Gated Self-Distillation for Stable Agentic Reinforcement Learning

Darwin: Evolutionary Merging for LLM Reasoning

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE