DeepSeek-V4 Open MoE Model Release
Key Questions
What is DeepSeek-V4?
DeepSeek-V4 is a massive open Mixture-of-Experts (MoE) model with 1.6T parameters, 49B active in Pro version, and 284B in Flash. It is adapted for Huawei chips and has a preview version released.
What context length does DeepSeek-V4 support?
It achieves a 1M token context window using hybrid attention. This is a significant improvement over V3.
How does DeepSeek-V4 perform on benchmarks?
It crushes long-context, coding, math, and reasoning benchmarks, rivaling closed frontier models. Efficiency scaling challenges proprietary labs.
What efficiency gains does DeepSeek-V4 offer?
It uses 27% fewer FLOPs and 10% less KV cache compared to V3. The paper is available on Hugging Face.
What remains unanswered about DeepSeek-V4?
Five subjective questions remain after its release. It positions Chinese AI to lead globally.
DeepSeek-V4 massive open MoE (1.6T/49B active Pro, 284B Flash) achieves 1M context with hybrid attention (27% FLOPs/10% KV vs V3), crushes long-context/coding/math/reasoning benchmarks rivaling closed frontiers; efficiency scaling challenges proprietary labs.