Autoresearch & deterministic evaluation (Sakana/Hyperagents + FIPO + SKILL0 + RLCF + InCoder + O-Series + SIEVE + noisy supervision)
Key Questions
How do end-to-end agents contribute to research automation?
They automate NeurIPS-level research tasks. Examples include Sakana/Hyperagents and Karpathy's LLM Wiki for PhD-level research.
What is FIPO and its performance on AIME?
FIPO doubles performance on AIME benchmarks. It advances autoresearch capabilities.
What is SKILL0 in reinforcement learning?
SKILL0 enables in-context agentic RL for skill internalization. It supports deterministic evaluation.
What does RLCF focus on?
RLCF involves taste preferences in RL setups. It aids in robust learning.
How does InCoder-32B-Thinking perform?
It serves as an industrial code world model for thinking. Boosts coding model performance.
What are the findings of the MIT scaling study?
MIT scaling achieves 50% success rates. Cloned AI 'workers' are often minimally sufficient.
What is SIEVE and its benefits?
SIEVE is sample-efficient for language model learning. It promises revolutionary parameter efficiency.
What does Cog-DRIFT address in RLVR?
Cog-DRIFT breaks exploration stalls in RLVR by learning from zero-reward examples. It enables models to reason robustly.
End-to-end agents automate NeurIPS-level research; Karpathy Wiki; SKILL0 RL; FIPO doubles AIME; RLCF taste; InCoder code; O-Series CoT; MIT scaling 50% success; SIEVE sample-efficient; noisy supervision robustness; new test-time learnable adaptation (ex-1c1ae0fd) + self-execution sim coding boost (ex-e66f2164); Stanford single-agents beat multi-agents on token budgets (ex-586625cc); Cog-DRIFT breaks RLVR exploration stalls (ex-339c3776); Paper Espresso for paper curation overload; Chi hidden data signals.