AI Research Pulse

Autoresearch & deterministic evals

Autoresearch & deterministic evals

Key Questions

What is AutoResearchClaw?

AutoResearchClaw is a self-reinforcing autonomous research system with human-AI collaboration that achieves +54.7% performance gains in research tasks. It focuses on iterative improvement through agentic workflows.

What advances are shown in self-evolving multi-agents?

The highlight covers self-evolving agents like GenEvolve, which uses tool-orchestrated visual experience distillation for image generation, alongside DeepSeek-V4 and RubricEM for deterministic evaluations.

What is ERA in the context of scientific research?

ERA (Empirical Research Assistance) is an AI system that automates writing high-performance scientific software, enabling expert-level empirical code generation for researchers.

How do AIRA-Compose and AIRA-Design work?

AIRA-Compose and AIRA-Design are neural architecture search methods using AI agents to design better neural networks, as explained in related videos and papers on agentic NAS.

What is Video2GUI used for?

Video2GUI synthesizes large-scale interaction trajectories from videos to pretrain generalized GUI agents, improving their ability to handle real-world interface tasks.

What benchmarks evaluate proactive LLM agents?

Benchmarks like π-Bench assess proactive personal assistant agents by incorporating hidden intents, task dependencies, and cross-session continuity in evaluations.

What is Spreadsheet-RL?

Spreadsheet-RL advances LLM agents on spreadsheet tasks through reinforcement learning, showing substantial performance gains in both generation and reasoning over tabular data.

How does GenEvolve enable self-evolving agents?

GenEvolve allows image generation agents to self-evolve by distilling visual experiences through tool orchestration, leading to continuous improvement without human intervention.

Self-evolving multi-agents + DeepSeek-V4 + RubricEM. New: BEAM, SDAR, AIRA-Compose/AIRA-Design NAS; ERA expert software generation; Video2GUI trajectory synthesis; AutoResearchClaw +54.7% gains; GenEvolve self-evolving image agents via visual distillation.

Sources (34)
Updated May 23, 2026