DeepSeek-V4 efficient million-ctx MoE [developing]

Key Questions

What are the main models and features of DeepSeek-V4?

DeepSeek-V4 includes the 1.6T-Pro model with 49B active parameters and the 284B-Flash with 13B active parameters, both supporting a 1M context length. It uses hybrid attention that reduces FLOPs by 73% and KV cache by 90% compared to V3. The models are Mixture-of-Experts (MoE) architectures designed for efficiency.

How does DeepSeek-V4 perform on benchmarks?

DeepSeek-V4 is neck-and-neck with Claude Opus 4.7 on a 38-task benchmark covering coding, reasoning, and finance, achieving an 8.90 score at Flash speed. It tops open-source models in coding and reasoning while rivaling closed-source leaders. A detailed comparison is available in the 'DeepSeek V4 vs Claude vs GPT-5.4' benchmark article.

What is the development status and deployment info for DeepSeek-V4?

DeepSeek-V4 is currently in development. Quantized versions for deployment on 32-64GB VRAM are anticipated, positioning it to challenge models like Qwen and GLM. Tools like Daena-Coder already support running DeepSeek models locally alongside others.

1.6T-Pro (49B active)/284B-Flash (13B active) with 1M ctx, hybrid attn slashes FLOPs 73%/KV 90% vs V3. Neck-and-neck Claude Opus 4.7 on 38-task coding/reasoning/finance bench (8.90 score, Flash speed). Tops OSS coding/reasoning, rivals closed leaders. Quants for 32-64GB VRAM deploys anticipated, challenges Qwen/GLM.

Sources (2)

Updated Apr 24, 2026

Open LLM Deploy

DeepSeek-V4 efficient million-ctx MoE [developing]

Key Questions

What are the main models and features of DeepSeek-V4?

How does DeepSeek-V4 perform on benchmarks?

What is the development status and deployment info for DeepSeek-V4?

Deep|DeepSeek V4 vs Claude vs GPT-5.4: A 38-Task Benchmark Across Coding, Reasoning, and Financial Research

Daena-Coder