Claude Opus 4.8 Drops with 2M Context, Honesty Focus & Massive Funding
Anthropic just shipped Claude Opus 4.8 alongside a record $65B Series H at $965B post-money valuation.
- New capabilities: 2M token context, stronger...

Created by Cheng Niu
Open‑source and flagship AI model releases, benchmarks, safety notes across LLMs, vision, speech, multimodal
Explore the latest content tracked by AI Model Release Tracker
Anthropic just shipped Claude Opus 4.8 alongside a record $65B Series H at $965B post-money valuation.
Nano Banana 2 (Gemini 3.1 Flash Image) and Nano Banana Pro (Gemini 3 Pro Image) are now generally available.
Technical system card analysis shows major upgrades in software engineering, agentic tool use, and knowledge work, plus reduced over-refusals and...
Anthropic is leaning hard into radical honesty with Opus 4.8, ditching the people-pleasing era for a model that pushes back on bad logic and flags its...
Anthropic's Claude Opus 4.8 launch plays on multiple fronts at once.
Qwen3.7-Max targets agent tasks like long-horizon coding and tool use with a 1M-token window.
No significant updates today.
No significant updates today.
EAGLE 3.1 fixes attention drift in speculative decoding by adding FC normalization and post-norm hidden-state feedback, yielding up to 2× longer...
StepAudio 2.5 Realtime swept all five April 2026 voice AI benchmarks, beating GPT Realtime 1.5 and Gemini Live with scores including 82.18 in...
SenseNova U1 delivers a native unified architecture that handles understanding, reasoning, and generation without separate vision encoders or VAEs.
-...
GPT-5.5 edges Gemini 3.5 Flash on agentic coding benchmarks while Gemini dominates tool-use and finance tasks at far lower cost.
Anthropic’s Claude Mythos Preview scanned 1,000+ open-source projects and flagged 23,019 potential vulnerabilities, including an estimated 6,202 high-...
AlphaProof Nexus solves nine open Erdős problems and 44 OEIS conjectures via Lean-verified proofs, marking a move beyond benchmark scores toward...
WBench delivers the first unified multi-turn benchmark for interactive video world models, addressing fragmented evaluation methods.
Meta Llama 4 delivers native multimodal capabilities in an open-source model, processing vision, audio, and text together for cross-modal tasks like...
Anthropic’s Claude Mythos Preview scanned over 1,000 open-source projects and surfaced 23,000 potential vulnerabilities.
Auto Benchmark Audit (ABA) introduces an agentic pipeline that systematically scans benchmark repositories, datasets, and agent trajectories for...