AI Research Pulse

Autoresearch & deterministic evaluation (Sakana/Hyperagents + FIPO + SKILL0 + RLCF + InCoder + O-Series + SIEVE + noisy supervision)

Autoresearch & deterministic evaluation (Sakana/Hyperagents + FIPO + SKILL0 + RLCF + InCoder + O-Series + SIEVE + noisy supervision)

Key Questions

How do end-to-end agents contribute to research automation?

They automate NeurIPS-level research tasks. Examples include Sakana/Hyperagents and Karpathy's LLM Wiki for PhD-level research.

What is FIPO and its performance on AIME?

FIPO doubles performance on AIME benchmarks. It advances autoresearch capabilities.

What is SKILL0 in reinforcement learning?

SKILL0 enables in-context agentic RL for skill internalization. It supports deterministic evaluation.

What does RLCF focus on?

RLCF involves taste preferences in RL setups. It aids in robust learning.

How does InCoder-32B-Thinking perform?

It serves as an industrial code world model for thinking. Boosts coding model performance.

What are the findings of the MIT scaling study?

MIT scaling achieves 50% success rates. Cloned AI 'workers' are often minimally sufficient.

What is SIEVE and its benefits?

SIEVE is sample-efficient for language model learning. It promises revolutionary parameter efficiency.

What does Cog-DRIFT address in RLVR?

Cog-DRIFT breaks exploration stalls in RLVR by learning from zero-reward examples. It enables models to reason robustly.

End-to-end agents automate NeurIPS-level research; Karpathy Wiki; SKILL0 RL; FIPO doubles AIME; RLCF taste; InCoder code; O-Series CoT; MIT scaling 50% success; SIEVE sample-efficient; noisy supervision robustness; new test-time learnable adaptation (ex-1c1ae0fd) + self-execution sim coding boost (ex-e66f2164); Stanford single-agents beat multi-agents on token budgets (ex-586625cc); Cog-DRIFT breaks RLVR exploration stalls (ex-339c3776); Paper Espresso for paper curation overload; Chi hidden data signals.

Sources (59)
Updated Apr 8, 2026