AI Research Radar

5h ago

Object Shortcuts Undermine Zero-Shot Action Recognition

Models in zero-shot compositional action recognition rely on object-driven shortcuts instead of temporal verb evidence.

Sparse supervision and...

Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

arxiv.org

Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

5h ago

AI Agents Hyper-Vulnerable to Nudges in Decisions

AI agents acting autonomously for humans overreact dramatically to nudges like defaults (99-100% acceptance vs. humans' 88%) and misleading highlights (83-100% vs. 57%), risking unpredictable and easily manipulated outcomes.

AI models are far more susceptible to misleading nudges than humans, study shows

psypost.org

AI models are far more susceptible to misleading nudges than humans, study shows

5h ago

SOTA VLA Results via Fine-Grained Annotations

Fine-grained subtask annotations power a vision + proprioception model to SOTA results, hitting 93.1 F1@50 on REASSEMBLE and 98.6 on Amazon Robotics blade insertion while generalizing across embodiments.

13h ago

AI Research Radar · Jul 12 Daily Digest

New Agent Benchmarks

🔥 UniClawBench: Introduces a capability-driven benchmark with 400 bilingual real-world tasks evaluating proactive agents...

DrugGen 2: A disease-aware language model for enhancing drug discovery

arxiv.org

DrugGen 2: A disease-aware language model for enhancing drug discovery

1d ago

aria: Lightweight Quantized Runtime for On-Device Text-to-Music

aria runs full Stable Audio 3 pipeline dependency-free on GPUs, CPUs, and Raspberry Pi 5
8-bit quantization slashes memory with no measurable drop...

A Quantized Native Runtime for On-Device Semantic Audio Generation

arxiv.org

A Quantized Native Runtime for On-Device Semantic Audio Generation

1d ago

LongE2V Applies Video Diffusion to Event Streams

LongE2V fine-tunes video diffusion priors for event-based reconstruction, prediction, and interpolation, delivering sharper textures and long-term...

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

arxiv.org

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

1d ago

Does Correctness Still Matter for LLMs?

In the age of LLMs, correctness faces a fundamental challenge. FSE 2026 keynote speaker Mary Shaw highlighted how tacit knowledge gaps,...

Does correctness still matter? - by Yuxi Li

yuxili.substack.com

Does correctness still matter? - by Yuxi Li

1d ago

Linear Attention vs Dynamic RoPE: Efficiency Trade-offs

Two distinct strategies tackle quadratic attention costs for long contexts.

Linear attention replaces softmax with recurrent memory mechanisms...

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

arxiv.org

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

1d ago

Flash-BoN: Draft-Based Scaling for Diffusion Inference

Flash-BoN shows that under wall-clock budgets, simple Best-of-N often matches or beats guided search methods that spend compute on intermediate...

Flash-BoN: Instant Drafts for Inference-Time Scaling in Diffusion Models

arxiv.org

Flash-BoN: Instant Drafts for Inference-Time Scaling in Diffusion Models

1d ago

Agent Ecosystem Shifts to Specialized Benchmarks

Three new releases highlight the move from generic LLM tests to targeted agent evaluation:

Tool-Star trains multi-tool web agents with reinforcement...

Tool-Star: Empowering Multi-Tool Collaborative Web Agent ...

1d ago·

dl.acm.org

1d ago

DrugGen 2 Brings Disease Context to AI Drug Design

DrugGen 2 generates molecules by conditioning on both disease ontology and target sequences, moving past target-only approaches to boost therapeutic...

arxiv.org

DrugGen 2: A disease-aware language model for enhancing drug discovery

1d ago

2d ago

AI Research Radar · Jul 10, 2026 Daily Digest

Embodied Robot Benchmarks

🔥 RoboDojo: Introduces a unified sim-and-real benchmark with 42 simulation tasks and 18 real-world tasks for...

3d ago

Scaling, Diagnosing, and Benchmarking Embodied World Models

Three concurrent advances mark progress toward reliable world models for robots:

MoE video pretraining: LingBot-Video scales a DiT-based MoE model...

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

arxiv.org

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

3d ago

Inference-Time LLM Adaptation Gains Traction

Four recent studies showcase lightweight methods that refine LLM outputs at inference without retraining.

Dictionary augmentation normalizes...

3d ago

Three Complementary Routes to Better Embodied Agents

Recent work shows three distinct levers for advancing embodied agents beyond hand-crafted designs.

Automated architecture search via AAS and KDLoop...

Automating the Design of Embodied Agent Architectures

arxiv.org

Automating the Design of Embodied Agent Architectures

3d ago

AI Research Radar · Jul 09 Daily Digest

Multimodal and Video Model Advances

Vision as Unified Multimodal Generation: Presents a framework unifying vision tasks through multimodal...

arxiv.org

Vision as Unified Multimodal Generation

4d ago

Efficient Attention Advances Target Long Contexts

Two papers push sparse, theoretically grounded mechanisms for scaling attention and retrieval.

HiLS Attention learns chunk selection end-to-end,...

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

arxiv.org

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

4d ago

SkillOpt-Lite: Minimal Viable Pipeline Beats Complex Agent Evolution

SkillOpt-Lite questions whether complex pipelines are needed for agent skill optimization, proposing a stripped-down alternative grounded in...

SkillOpt-Lite: Better and Faster Agent Self-evolution via One Line of Vibe

arxiv.org

SkillOpt-Lite: Better and Faster Agent Self-evolution via One Line of Vibe

4d ago

LCA: Model-Agnostic Orchestration for Oncology CDS

LCA introduces a model-agnostic, post-hoc orchestration framework that decouples multimodal data ingestion from AI inference in oncology, overcoming...

The Large Cancer Assistant (LCA): A Model-Agnostic ...

4d ago·

arxiv.org

4d ago

Four Papers Redefining Video Paradigms

Four recent works reveal a clear shift toward unified, efficient, and agentic video systems.

Parallel autoregressive decoding exploits weak event...

Parallelized Autoregressive Decoding for Omni-Modal Dense Video Captioning

arxiv.org

Parallelized Autoregressive Decoding for Omni-Modal Dense Video Captioning

4d ago

Agent safety & verification fragility (ClawNet, TACO, DECEPTICON, Neurocognitive)

Digest Calendar

Recent Posts

Object Shortcuts Undermine Zero-Shot Action Recognition

Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

AI Agents Hyper-Vulnerable to Nudges in Decisions

AI models are far more susceptible to misleading nudges than humans, study shows

SOTA VLA Results via Fine-Grained Annotations

AI Research Radar · Jul 12 Daily Digest

New Agent Benchmarks

DrugGen 2: A disease-aware language model for enhancing drug discovery

aria: Lightweight Quantized Runtime for On-Device Text-to-Music

A Quantized Native Runtime for On-Device Semantic Audio Generation

LongE2V Applies Video Diffusion to Event Streams

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

Does Correctness Still Matter for LLMs?

Does correctness still matter? - by Yuxi Li

Linear Attention vs Dynamic RoPE: Efficiency Trade-offs

Linear Attention Architectures: Mechanisms, Trade-offs, and Cross-Layer Routing

Flash-BoN: Draft-Based Scaling for Diffusion Inference

Flash-BoN: Instant Drafts for Inference-Time Scaling in Diffusion Models

Agent Ecosystem Shifts to Specialized Benchmarks

Tool-Star: Empowering Multi-Tool Collaborative Web Agent ...

DrugGen 2 Brings Disease Context to AI Drug Design

DrugGen 2: A disease-aware language model for enhancing drug discovery

AI Research Radar · Jul 10, 2026 Daily Digest

Embodied Robot Benchmarks

Scaling, Diagnosing, and Benchmarking Embodied World Models

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

Inference-Time LLM Adaptation Gains Traction

Three Complementary Routes to Better Embodied Agents

Automating the Design of Embodied Agent Architectures

AI Research Radar · Jul 09 Daily Digest

Multimodal and Video Model Advances

Vision as Unified Multimodal Generation

Efficient Attention Advances Target Long Contexts

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

SkillOpt-Lite: Minimal Viable Pipeline Beats Complex Agent Evolution

SkillOpt-Lite: Better and Faster Agent Self-evolution via One Line of Vibe

LCA: Model-Agnostic Orchestration for Oncology CDS

The Large Cancer Assistant (LCA): A Model-Agnostic ...

Four Papers Redefining Video Paradigms

Parallelized Autoregressive Decoding for Omni-Modal Dense Video Captioning