Q1 2026 Open-Weights Surge: DeepSeek R2, Llama 4 [developing]
Key Questions
What is the Q1 2026 open-weights surge?
Q1 2026 saw a surge in open-weights models, highlighted in podcasts, with DeepSeek R2's sparse MoE topping benchmarks and Llama 4's 200B MoE (22B active). This confirms a tipping point for MoE and local privacy over closed models.
What are the key models in the open-weights surge?
DeepSeek R2 features sparse MoE topping benchmarks, while Llama 4 is a 200B MoE model (22B active) runnable on 32-64GB consumer hardware. These models drive the shift toward self-hosting.
How do self-hosting costs compare to APIs for these models?
Self-hosting crosses cost parity with APIs at 800M tokens per month, breaking even for 5-20k/mo API spenders. Tools like vLLM and SGLang are favored for efficient inference.
Podcast highlights surge with DeepSeek R2 sparse MoE topping benches, Llama 4 200B MoE (22B active) on 32-64GB consumer HW; self-host cost crossover at 800M tokens/mo (break-even 5-20k/mo API spend) vs APIs, vLLM/SGLang favored. Confirms MoE/local privacy tipping point vs closed models.