Research acceleration: Cog-DRIFT RLVR + Stanford multi-agent myth + SkillX/FileGram + Self-Distilled RLVR + Vero/AlphaEvolve + Meta Harness + Agentic-MME + LeCun/OpenWorldLib + Karpathy + ClawArena + RL scaling/benches

Key Questions

What is Cog-DRIFT?

Cog-DRIFT breaks RLVR exploration stalls by learning from zero-reward examples. It enables progress on hard problems.

What did Stanford research find on multi-agents?

Stanford's paper shows multi-agents yield no gains matching compute increases. More agents do not always improve results.

What is SkillX and FileGram?

SkillX auto-generates skills; FileGram personalizes file systems. They advance agent capabilities.

What is Self-Distilled RLVR?

Self-Distilled RLVR improves models via self-execution simulation. It enhances coding and reasoning.

What is Vero?

Vero is an open RL recipe for general visual reasoning. It includes AlphaEvolve elements.

What is Meta-Harness?

Meta-Harness optimizes model harnesses end-to-end. It supports agentic research acceleration.

What is Agentic-MME?

Agentic-MME evaluates agentic capabilities in multimodal intelligence. It explores learn-to-learn and self-execution.

What benchmarks are advancing RL scaling?

ClawArena, ARC-3, and H-Bench drive RL scaling research. Karpathy notes 11% improvements.

Cog-DRIFT breaks RLVR exploration stall; Stanford: multi-agents no gain matched compute; SkillX auto-skills/FileGram FS personalization; Self-Distilled RLVR/Vero visual/DeepMind MARL/Meta Harness; Agentic-MME/learn-to-learn/self-execution; LeCun JEPA/OpenWorldLib; Karpathy 11%; ClawArena/ARC-3/H-Bench; RL scaling.

Sources (21)

Updated Apr 8, 2026

AI Agents Hub

Research acceleration: Cog-DRIFT RLVR + Stanford multi-agent myth + SkillX/FileGram + Self-Distilled RLVR + Vero/AlphaEvolve + Meta Harness + Agentic-MME + LeCun/OpenWorldLib + Karpathy + ClawArena + RL scaling/benches

Key Questions

What is Cog-DRIFT?

What did Stanford research find on multi-agents?

What is SkillX and FileGram?

What is Self-Distilled RLVR?

What is Vero?

What is Meta-Harness?

What is Agentic-MME?

What benchmarks are advancing RL scaling?

@EliasEskin: 🚨 Excited to share Cog-DRIFT, new work on enabling models to learn from zero-reward examples! RLVR...

@EliasEskin reposted: 🚨Excited to share Cog-DRIFT: When problems are too hard (pass@64=0), standard R...

Hybrid Attention

@omarsar0: NEW paper on multi-agents from Stanford. More agents, better results, right? Not so fast. This pa...

Self-Execution Simulation Improves Coding Models

Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies

Vero: An Open RL Recipe for General Visual Reasoning

Can LLMs Learn to Reason Robustly under Noisy Supervision?

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

@zainhasan6: only 2k views on this gem of a lecture The art of scaling reinforcement learning compute for LLMs h...

InCoder-32B-Thinking: Industrial Code World Model for Thinking

Meta-Harness: End-to-End Optimization of Model Harnesses

Google DeepMind’s Research Lets an LLM Rewrite Its Own Game Theory Algorithms — And It Outperformed the Experts

@jaseweston: 🧮 Reasoning over Mathematical Objects 🧮 Our 70-page(!) paper is out on arXiv, as covered by several...

Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory

HippoCamp: Benchmarking Contextual Agents on Personal Computers

@omarsar0: // Unified Inference and Training Framework for Agent Memory // Most memory-augmented agents are bu...

MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Embarrassingly Simple Self-Distillation Improves Code Generation

@omarsar0: Most devs think that adding more agents to a planning system should help. The math says otherwise. ...

@LukeZettlemoyer reposted: We've been experimenting with a new class of agentic workflows emerging from fro...