AI Research Digest

May 29, 2026

AI Research Digest · May 29 Daily Digest

Agentic Systems and Autonomous Research

🔥 AutoScientists: Introduces decentralized AI agent teams that self-organize around hypotheses,...

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

arxiv.org

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

May 27, 2026

AI Research Digest · 2026-05-27 Daily Digest

No significant updates today.

On-Policy Adversarial Flow Distillation for Autoregressive Video Generation

arxiv.org

On-Policy Adversarial Flow Distillation for Autoregressive Video Generation

May 26, 2026

智能体全栈进展：合成训练、运行时追踪与技能基准

三篇新作勾勒出智能体开发全栈路径：

QUEST 用统一评分树合成8K任务，通过中训练+SFT+RL训练2B-35B开源模型，在八个深度研究基准接近闭源前沿水平。
Shepherd 提供函数式运行时基座，将执行轨迹转为Git式可分叉对象，支持元代理实时监控、回滚与并行干预，性能提升显著。
SkillEvolBench 构建180任务基准，揭示当前代理多停留在情景复用，难以稳定形成可迁移程序化技能。
全栈闭环正加速形成。

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

arxiv.org

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

May 26, 2026

Two Routes to Digital Twins: 360° Diffusion vs Feed-Forward Meshes

Two complementary methods advance digital-twin creation from limited inputs.

Pantheon360 employs 3D-aware 360° video diffusion plus an explicit 3D...

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

arxiv.org

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

May 26, 2026

多模态建模的两大挑战

模型合并的限制：单语预训练模型直接合并会因表示差异过大导致性能崩溃，混合数据训练更稳定。
原生多模态路线图：定义架构原生性，划分Multi-to-Text等三类模型，并系统梳理从数据到部署的完整工业流程。
趋势启示：预训练阶段需早期对齐，未来多模态模型或需从头设计原生架构而非后期合并。

On the Limits of Model Merging for Multilinguality in Pre-Training

arxiv.org

On the Limits of Model Merging for Multilinguality in Pre-Training

May 26, 2026

视频世界模型多维突破：并行工具、交互基准与蒸馏

视频世界模型正从生成迈向交互评估，三篇新作展现并行进展。

并行工具调用：ParaVT首创多代理RL框架，一次性调度多时间窗裁剪，解决Tool Prior Paradox，六基准平均提升7.9%。
交互式基准：WBench推出289案例、1058回合的多轮测试，覆盖五维度与四交互类型，无模型全优。
自回归蒸馏：AFD提出on-policy对抗流蒸馏，无需教师分数即可实现黑箱提炼，提升运动与物理一致性。

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

arxiv.org

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

May 26, 2026

DeepMind vs Axiom: Two Paths in AI Math Proofs

DeepMind's AlphaProof Nexus solves 9 Erdős problems and 44 OEIS conjectures using LLM proposals checked by Lean, at a few hundred dollars each.
-...

Google DeepMind’s AlphaProof Nexus Solves Erdős Problems as AI Math Race Moves Beyond Benchmarks

winbuzzer.com

Google DeepMind’s AlphaProof Nexus Solves Erdős Problems as AI Math Race Moves Beyond Benchmarks

May 26, 2026

Google Unveils Gemini for Science Tools to Accelerate Discovery

Google has launched Gemini for Science, a suite of experimental tools on Google Labs—including Hypothesis Generation with Co-Scientist, Computational...

Gemini for Science: AI experiments and tools for a new era of discovery

blog.google

Gemini for Science: AI experiments and tools for a new era of discovery

May 26, 2026

Generative Search Engines Cite AI Sources

A new arXiv audit of ChatGPT, Copilot, Gemini and Perplexity found ~16% of cited sources across 712 real-world queries were AI-generated, raising risks that users may treat synthetic content as authoritative.

Preprint: “Synthetic Sources?: Auditing Generative Search Engine Citations for Evidence of AI-Generated Sources”

May 26, 2026·

infodocket.com

May 26, 2026

MolmoAct2 Outperforms π0.5 with 720-Hour Bimanual Dataset

Ai2's MolmoAct2 open robotics model surpasses π0.5 on real-world and simulation benchmarks, runs up to 37x faster, and ships with the largest open bimanual dataset (720 hours, 34,500 demos). All weights, code, and tokenizer are fully public.

Open Source Robotics Model MolmoAct2: Ai2 Beats π0.5, Releases 720-Hour Bimanual Dataset

techtimes.com

Open Source Robotics Model MolmoAct2: Ai2 Beats π0.5, Releases 720-Hour Bimanual Dataset

May 26, 2026

May 25, 2026

AI Research Digest · May 25, 2026 Daily Digest

Video and Geometry Model Advances

VGenST-Bench: A benchmark for spatio-temporal reasoning via active video synthesis.
Good Token Hunting: A...

May 25, 2026

Unified World Models Bridge Physics and Gaming

Unified world models are emerging that turn raw data into sim-ready, interactive environments.

PhysX-Omni delivers the first unified framework for...

May 25, 2026

scpFormer Unifies Fragmented Single-Cell Proteomics

scpFormer introduces a transformer foundation model that unifies single-cell proteomics across technologies via amino acid sequence tokenization and...

May 25, 2026

Trend: Decoupling and Token Selection Ease VLM Bottlenecks

Two fresh papers reveal a clear pattern: targeted decoupling and smart pruning tackle core VLM limits in perception, reasoning, and compute.

-...

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

arxiv.org

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

May 25, 2026

StepAudio 2.5 Unifies ASR, TTS, and Realtime via RLHF

StepAudio 2.5 shows a single audio-language foundation model can match or exceed specialized systems across ASR, TTS, and realtime spoken interaction....