DeepSeek-V4 efficient million-ctx MoE [developing]
Key Questions
What are the main models and features of DeepSeek-V4?
DeepSeek-V4 includes the 1.6T-Pro model with 49B active parameters and the 284B-Flash with 13B active parameters, both supporting a 1M context length. It uses hybrid attention that reduces FLOPs by 73% and KV cache by 90% compared to V3. The models are Mixture-of-Experts (MoE) architectures designed for efficiency.
How does DeepSeek-V4 perform on benchmarks?
DeepSeek-V4 is neck-and-neck with Claude Opus 4.7 on a 38-task benchmark covering coding, reasoning, and finance, achieving an 8.90 score at Flash speed. It tops open-source models in coding and reasoning while rivaling closed-source leaders. A detailed comparison is available in the 'DeepSeek V4 vs Claude vs GPT-5.4' benchmark article.
What is the development status and deployment info for DeepSeek-V4?
DeepSeek-V4 is currently in development. Quantized versions for deployment on 32-64GB VRAM are anticipated, positioning it to challenge models like Qwen and GLM. Tools like Daena-Coder already support running DeepSeek models locally alongside others.
1.6T-Pro (49B active)/284B-Flash (13B active) with 1M ctx, hybrid attn slashes FLOPs 73%/KV 90% vs V3. Neck-and-neck Claude Opus 4.7 on 38-task coding/reasoning/finance bench (8.90 score, Flash speed). Tops OSS coding/reasoning, rivals closed leaders. Quants for 32-64GB VRAM deploys anticipated, challenges Qwen/GLM.