DeepSeek-V4 Open MoE Model Release

Key Questions

What is DeepSeek-V4?

DeepSeek-V4 is a massive open Mixture-of-Experts (MoE) model with 1.6T parameters, 49B active in Pro version, and 284B in Flash. It is adapted for Huawei chips and has a preview version released.

What context length does DeepSeek-V4 support?

It achieves a 1M token context window using hybrid attention. This is a significant improvement over V3.

How does DeepSeek-V4 perform on benchmarks?

It crushes long-context, coding, math, and reasoning benchmarks, rivaling closed frontier models. Efficiency scaling challenges proprietary labs.

What efficiency gains does DeepSeek-V4 offer?

It uses 27% fewer FLOPs and 10% less KV cache compared to V3. The paper is available on Hugging Face.

What remains unanswered about DeepSeek-V4?

Five subjective questions remain after its release. It positions Chinese AI to lead globally.

DeepSeek-V4 massive open MoE (1.6T/49B active Pro, 284B Flash) achieves 1M context with hybrid attention (27% FLOPs/10% KV vs V3), crushes long-context/coding/math/reasoning benchmarks rivaling closed frontiers; efficiency scaling challenges proprietary labs.

Sources (4)