NVIDIA Empire

Rubin ramp, delays & inference economics

Rubin ramp, delays & inference economics

Key Questions

What delays are affecting NVIDIA's Rubin ramp?

TSMC 3nm, HBM4, and helium delays have led to a 25% production cut, targeting 1.5M units in H2 2026 and 2027 Ultra.

What are the specs of CoWoS-L for Rubin?

CoWoS-L supports dual-die with 288 chips, enabling racks of 450-500kW power draw.

What configurations support NVIDIA's high-end systems?

NVL72/576 and Supermicro HGX are key for Rubin's scale-up architectures.

How is Marvell involved in NVIDIA's ecosystem?

Marvell is advancing photonics and NVLink Fusion for optical interconnects in AI data centers.

What inference rivalry is emerging?

Hyperscalers are developing in-house inference chips, challenging NVIDIA's dominance.

What performance does Groq LPX offer?

Groq LPX achieves 700K tokens per second per MW in inference economics.

Why are Rubin production targets reduced?

HBM4 supply bottlenecks from Micron and SK Hynix, plus packaging challenges, forced the cuts.

What interconnect shifts are expected?

All AI data center interconnects are projected to go optical within 5 years as copper limits are reached.

TSMC 3nm/HBM4/helium delays, 25% prod cut to 1.5M 2026 H2 2027 Ultra; CoWoS-L dual-die 288 chips 450-500kW; NVL72/576 Supermicro HGX; Marvell photonics; hyperscaler in-house inference rivalry; 700K tok/sec/MW Groq LPX.

Sources (58)
Updated Apr 8, 2026