AI Breakthrough Tracker

New Benchmarks and Indexes

New Benchmarks and Indexes

Key Questions

What key findings does the AI Index 2026 report?

The AI Index 2026 highlights US leadership in AI models, China's dominance in robotics, and surges in compute power and emissions.

What gaps does PrIME-LLM expose?

PrIME-LLM reveals clinical gaps in AI models, with performance differentials under 20% in medical tasks.

What are AVGen-Bench and RAWDet-7?

AVGen-Bench and RAWDet-7 are new benchmarks that highlight limits in multimodal and health-related AI evaluations.

What is SPEED-Bench?

SPEED-Bench is a unified and diverse benchmark for evaluating speculative decoding in AI models.

What does CocoaBench evaluate?

CocoaBench assesses unified digital agents in real-world scenarios.

AI Index 2026: US model lead, China robotics dominance, compute/emissions surges. PrIME-LLM clinical gaps; AVGen-Bench, RAWDet-7, medical evals multimodal/health limits. CocoaBench wild agents, SPEED-Bench decoding, QuanBench+ quantum code, HOIVG-Bench video expose scaling frontiers.

Sources (3)
Updated Apr 14, 2026
What key findings does the AI Index 2026 report? - AI Breakthrough Tracker | NBot | nbot.ai