AI Research Radar

July 13, 2026

AI Research Radar · Jul 13 Daily Digest

Compositional Action Recognition

🔥 Mitigating Object-Driven Shortcuts: New paper proposes RCORE with CPR and TORC to reduce object-driven...

July 12, 2026

Object Shortcuts Undermine Zero-Shot Action Recognition

Models in zero-shot compositional action recognition rely on object-driven shortcuts instead of temporal verb evidence.

Sparse supervision and...

Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

arxiv.org

Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

July 12, 2026

AI Agents Hyper-Vulnerable to Nudges in Decisions

AI agents acting autonomously for humans overreact dramatically to nudges like defaults (99-100% acceptance vs. humans' 88%) and misleading highlights (83-100% vs. 57%), risking unpredictable and easily manipulated outcomes.

AI models are far more susceptible to misleading nudges than humans, study shows

psypost.org

AI models are far more susceptible to misleading nudges than humans, study shows

July 12, 2026

SOTA VLA Results via Fine-Grained Annotations

Fine-grained subtask annotations power a vision + proprioception model to SOTA results, hitting 93.1 F1@50 on REASSEMBLE and 98.6 on Amazon Robotics blade insertion while generalizing across embodiments.

July 12, 2026

AI Research Radar · Jul 12 Daily Digest

New Agent Benchmarks

🔥 UniClawBench: Introduces a capability-driven benchmark with 400 bilingual real-world tasks evaluating proactive agents...

DrugGen 2: A disease-aware language model for enhancing drug discovery

arxiv.org

DrugGen 2: A disease-aware language model for enhancing drug discovery

July 11, 2026

aria: Lightweight Quantized Runtime for On-Device Text-to-Music

aria runs full Stable Audio 3 pipeline dependency-free on GPUs, CPUs, and Raspberry Pi 5
8-bit quantization slashes memory with no measurable drop...

A Quantized Native Runtime for On-Device Semantic Audio Generation

arxiv.org

A Quantized Native Runtime for On-Device Semantic Audio Generation

July 11, 2026

LongE2V Applies Video Diffusion to Event Streams

LongE2V fine-tunes video diffusion priors for event-based reconstruction, prediction, and interpolation, delivering sharper textures and long-term...

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

arxiv.org

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

July 11, 2026

Does Correctness Still Matter for LLMs?

In the age of LLMs, correctness faces a fundamental challenge. FSE 2026 keynote speaker Mary Shaw highlighted how tacit knowledge gaps,...

Does correctness still matter? - by Yuxi Li

yuxili.substack.com

Does correctness still matter? - by Yuxi Li

July 11, 2026

Linear Attention vs Dynamic RoPE: Efficiency Trade-offs

Two distinct strategies tackle quadratic attention costs for long contexts.

Linear attention replaces softmax with recurrent memory mechanisms...

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

arxiv.org

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

July 11, 2026

Flash-BoN: Draft-Based Scaling for Diffusion Inference

Flash-BoN shows that under wall-clock budgets, simple Best-of-N often matches or beats guided search methods that spend compute on intermediate...

Flash-BoN: Instant Drafts for Inference-Time Scaling in Diffusion Models

arxiv.org

Flash-BoN: Instant Drafts for Inference-Time Scaling in Diffusion Models

July 11, 2026

Agent Ecosystem Shifts to Specialized Benchmarks

Three new releases highlight the move from generic LLM tests to targeted agent evaluation:

Tool-Star trains multi-tool web agents with reinforcement...

Tool-Star: Empowering Multi-Tool Collaborative Web Agent ...

July 11, 2026·

dl.acm.org

July 11, 2026

DrugGen 2 Brings Disease Context to AI Drug Design

DrugGen 2 generates molecules by conditioning on both disease ontology and target sequences, moving past target-only approaches to boost therapeutic...

arxiv.org

DrugGen 2: A disease-aware language model for enhancing drug discovery

July 11, 2026

July 10, 2026

AI Research Radar · Jul 10, 2026 Daily Digest

Embodied Robot Benchmarks

🔥 RoboDojo: Introduces a unified sim-and-real benchmark with 42 simulation tasks and 18 real-world tasks for...

July 9, 2026

Scaling, Diagnosing, and Benchmarking Embodied World Models

Three concurrent advances mark progress toward reliable world models for robots:

MoE video pretraining: LingBot-Video scales a DiT-based MoE model...

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

arxiv.org

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

July 9, 2026

Inference-Time LLM Adaptation Gains Traction

Four recent studies showcase lightweight methods that refine LLM outputs at inference without retraining.

Dictionary augmentation normalizes...

July 9, 2026

Three Complementary Routes to Better Embodied Agents

Recent work shows three distinct levers for advancing embodied agents beyond hand-crafted designs.

Automated architecture search via AAS and KDLoop...

Automating the Design of Embodied Agent Architectures

arxiv.org

Automating the Design of Embodied Agent Architectures

July 9, 2026

AI Research Radar · Jul 09 Daily Digest

Multimodal and Video Model Advances

Vision as Unified Multimodal Generation: Presents a framework unifying vision tasks through multimodal...

arxiv.org

Vision as Unified Multimodal Generation

July 8, 2026

Efficient Attention Advances Target Long Contexts

Two papers push sparse, theoretically grounded mechanisms for scaling attention and retrieval.

HiLS Attention learns chunk selection end-to-end,...

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

arxiv.org

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

July 8, 2026

SkillOpt-Lite: Minimal Viable Pipeline Beats Complex Agent Evolution

SkillOpt-Lite questions whether complex pipelines are needed for agent skill optimization, proposing a stripped-down alternative grounded in...

SkillOpt-Lite: Better and Faster Agent Self-evolution via One Line of Vibe

arxiv.org

SkillOpt-Lite: Better and Faster Agent Self-evolution via One Line of Vibe

July 8, 2026

LCA: Model-Agnostic Orchestration for Oncology CDS

LCA introduces a model-agnostic, post-hoc orchestration framework that decouples multimodal data ingestion from AI inference in oncology, overcoming...

The Large Cancer Assistant (LCA): A Model-Agnostic ...

July 8, 2026·

arxiv.org

Agent safety & verification fragility (ClawNet, TACO, DECEPTICON, Neurocognitive)

Digest Calendar

Recent Posts

AI Research Radar · Jul 13 Daily Digest

Compositional Action Recognition

Object Shortcuts Undermine Zero-Shot Action Recognition

Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

AI Agents Hyper-Vulnerable to Nudges in Decisions

AI models are far more susceptible to misleading nudges than humans, study shows

SOTA VLA Results via Fine-Grained Annotations

AI Research Radar · Jul 12 Daily Digest

New Agent Benchmarks

DrugGen 2: A disease-aware language model for enhancing drug discovery

aria: Lightweight Quantized Runtime for On-Device Text-to-Music

A Quantized Native Runtime for On-Device Semantic Audio Generation

LongE2V Applies Video Diffusion to Event Streams

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

Does Correctness Still Matter for LLMs?

Does correctness still matter? - by Yuxi Li

Linear Attention vs Dynamic RoPE: Efficiency Trade-offs

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

Flash-BoN: Draft-Based Scaling for Diffusion Inference

Flash-BoN: Instant Drafts for Inference-Time Scaling in Diffusion Models

Agent Ecosystem Shifts to Specialized Benchmarks

Tool-Star: Empowering Multi-Tool Collaborative Web Agent ...

DrugGen 2 Brings Disease Context to AI Drug Design

DrugGen 2: A disease-aware language model for enhancing drug discovery

AI Research Radar · Jul 10, 2026 Daily Digest

Embodied Robot Benchmarks

Scaling, Diagnosing, and Benchmarking Embodied World Models

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

Inference-Time LLM Adaptation Gains Traction

Three Complementary Routes to Better Embodied Agents

Automating the Design of Embodied Agent Architectures

AI Research Radar · Jul 09 Daily Digest

Multimodal and Video Model Advances

Vision as Unified Multimodal Generation

Efficient Attention Advances Target Long Contexts

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

SkillOpt-Lite: Minimal Viable Pipeline Beats Complex Agent Evolution

SkillOpt-Lite: Better and Faster Agent Self-evolution via One Line of Vibe

LCA: Model-Agnostic Orchestration for Oncology CDS

The Large Cancer Assistant (LCA): A Model-Agnostic ...