Home Explore Pricing Blog Docs New Tracker

Get the App

•

AI Infrastructure Digest - NBot Tracker | nbot.ai

AI Infrastructure Digest

Created by Rachel Brooks

163 posts

Updated 61 days ago

0 scanned

Daily highlights of applied AI infrastructure research for large-scale training and serving

Create Similar Tracker

Highlights for you

Global data-center power & site buildout constraints

Multi-GW ramps (Meta/NVIDIA/Nebius/Mistral/CoreWeave/Google) face grid/HV/cooling thru 2027; onsite 56GW/microgrids incl Google 933MW Crusoe gas no CCS, 63 DC projects $185B '26 capex; modular Duos/LG; Blackwell/Arm power density. Carbon-aware KV/SRE emerging; Arm 45k cores/200kW aids efficiency.

1 source

Use arrow keys to navigate

Digest Calendar

June 2026

Sun

Mon

Tue

Wed

Thu

Fri

Sat

Inference Serving Deep Dives

🔥 vLLM and PagedAttention: Open-source library vLLM from UC Berkeley solves the KV Cache memory crisis for...

April 5, 2026

Nvidia's Photonic Push Beyond Copper for Rack-Scale AI

Copper limits hit: GB200 NVL72 racks cram 72 GPUs via short copper runs at 1.8 TB/s, but signals degrade beyond feet.
Optical scale-up: Vera Rubin...

Nvidia embraces optical scale-up as copper reaches limits • The Register

theregister.com

Nvidia embraces optical scale-up as copper reaches limits • The Register

April 5, 2026

SSD: LLMs Self-Improve Code Gen Without Teachers

SSD revolutionizes LLM coding by distilling from raw self-outputs, bypassing teachers or RL – ideal for efficient production scaling.

Method:...

April 5, 2026

Multiscreen: Screening Replaces Softmax for 3.2x Faster LLM Inference

Screening mechanism ditches softmax redistribution by thresholding keys directly, enabling absolute query-key relevance without global competition.

-...

April 5, 2026

vLLM's PagedAttention: Fixing KV Cache Waste for 4x Denser LLM Serving

vLLM revolutionizes LLM inference by tackling KV cache fragmentation, the silent killer wasting 60-80% of GPU memory via over-provisioning.

-...

A Deep Dive into vLLM and PagedAttention | by Paras Jain | Apr, 2026 | Medium

medium.com

A Deep Dive into vLLM and PagedAttention | by Paras Jain | Apr, 2026 | Medium

April 5, 2026

April 4, 2026

AI Infrastructure Digest · Apr 4 Daily Digest

Multi-Tenant GPU Clusters

🔥 Sony PlayStation's Global Research Cluster: 40-minute video details design and operation of a multi-tenant...

April 4, 2026

FAANG LLM Production: KServe + Triton Full Walkthrough

Hands-on guide to deploying LLMs at scale like FAANG teams on Kubernetes:

KServe basics vs. Flask APIs, plus InferenceService YAML for quick LLM...

April 4, 2026

Supermicro's 8U Blackwell B200 & B300 Systems Breakdown

Supermicro unpacks its powerful 8U AI/HPC platforms for NVIDIA Blackwell:

B200: SYS-822GS-NBRT; B300: SYS-822GS-NB3RT / AS-8126GS-NB3RT
Key...

April 4, 2026

Sony PlayStation's Global Multi-Tenant GPU Cluster for AI R&D

Sony's multi-tenant GPU cluster accelerates AI and visual computing for PlayStation consoles and game studios, training NVIDIA GPU models for...

April 3, 2026

AI Infrastructure Digest · Apr 3 Daily Digest

LLM Inference Innovations

🔥 Cache-aware Prefill-Decode Disaggregation (CPD): Together AI's CPD separates cold and warm workloads using...

April 3, 2026

Google's Pragmatic Pivot: Natural Gas Sans CCS for AI Data Centers

Google's AI power crunch drives a bold shift:

Massive scale: 30.8M MWh in 2024, $185B capex in 2026 for 800-1,000MW DCs across 63 sites.
Goodnight...

Google Is Considering Natural Gas Without Carbon Capture in Its AI Data Center Strategy

substack.com

Google Is Considering Natural Gas Without Carbon Capture in Its AI Data Center Strategy

April 3, 2026

Microsoft's MAI Suite Enters AI Fray with Nvidia Vera Rubin Cost Cuts

Microsoft launches MAI foundational models to rival OpenAI on performance and price, offering cheaper proprietary options for transcription, voice,...

April 3, 2026

GPU Trends: Multi-Model vLLM Packing and CPD for Long-Context Speedups

Key optimizations boosting GPU utilization for LLM serving:

vLLM multi-model on one GPU: Adjust --gpu-memory-utilization to split VRAM and run...

April 2, 2026

AI Infrastructure Digest · Apr 2 Daily Digest

MLPerf Inference v6.0 Benchmarks

🔥 CoreWeave Leads in Datacenter Closed Division: CoreWeave announced leading inference performance in MLPerf...

April 2, 2026

vCluster: Enterprise K8s for Scalable, Profitable AI GPU Clouds

Key angles on running enterprise-scale Kubernetes for GPU AI services:

Real-world ops: JPMorganChase spins up isolated clusters in seconds,...

April 2, 2026

Step-by-Step NemoClaw Cloud Provisioning for Production

New video guide for deploying Nvidia's NemoClaw agentic AI in the cloud:

Why cloud? Recommended for production NemoClaw deployments
Provision GPU...

April 2, 2026

PrismML's Bonsai: 1-Bit LLMs Challenge GPU Reliance

Ultra-efficient 1-bit LLMs breakthrough: PrismML's Bonsai series, trained from scratch on BitNet architecture, revives Microsoft's BitNet for local...

April 2, 2026

Intuit Scales AI Agents to 3M Users: Key Production Lessons

Intuit's AI agents serve 3 million customers with 80.5% retention, automating bookkeeping tasks like reconciliation and payroll.

AI + HI model:...

April 2, 2026

Distributed Cloud: Redesigning Platforms for Real-Time AI in 2026

Distributed cloud as default AI architecture: Run inference closer to users for improved responsiveness and resiliency
Real-time personalization...

April 2, 2026

MLPerf v6.0 Leaders: AMD MI355X and CoreWeave NVIDIA Set GenAI Inference Benchmarks

AMD Instinct MI355X delivers breakthroughs across new GenAI workloads, from single GPU to multi-node scale.
CoreWeave leads with NVIDIA GB200...

AI Infrastructure Digest

Global data-center power & site buildout constraints

Digest Calendar

Recent Posts

AI Infrastructure Digest · Apr 5 Daily Digest

Inference Serving Deep Dives

Nvidia's Photonic Push Beyond Copper for Rack-Scale AI

Nvidia embraces optical scale-up as copper reaches limits • The Register

SSD: LLMs Self-Improve Code Gen Without Teachers

Multiscreen: Screening Replaces Softmax for 3.2x Faster LLM Inference

vLLM's PagedAttention: Fixing KV Cache Waste for 4x Denser LLM Serving

A Deep Dive into vLLM and PagedAttention | by Paras Jain | Apr, 2026 | Medium

AI Infrastructure Digest · Apr 4 Daily Digest

Multi-Tenant GPU Clusters

FAANG LLM Production: KServe + Triton Full Walkthrough

Supermicro's 8U Blackwell B200 & B300 Systems Breakdown

Sony PlayStation's Global Multi-Tenant GPU Cluster for AI R&D

AI Infrastructure Digest · Apr 3 Daily Digest

LLM Inference Innovations

Google's Pragmatic Pivot: Natural Gas Sans CCS for AI Data Centers

Google Is Considering Natural Gas Without Carbon Capture in Its AI Data Center Strategy

Microsoft's MAI Suite Enters AI Fray with Nvidia Vera Rubin Cost Cuts

GPU Trends: Multi-Model vLLM Packing and CPD for Long-Context Speedups

AI Infrastructure Digest · Apr 2 Daily Digest

MLPerf Inference v6.0 Benchmarks

vCluster: Enterprise K8s for Scalable, Profitable AI GPU Clouds

Step-by-Step NemoClaw Cloud Provisioning for Production

PrismML's Bonsai: 1-Bit LLMs Challenge GPU Reliance

Intuit Scales AI Agents to 3M Users: Key Production Lessons

Distributed Cloud: Redesigning Platforms for Real-Time AI in 2026

MLPerf v6.0 Leaders: AMD MI355X and CoreWeave NVIDIA Set GenAI Inference Benchmarks

Reading Activity