DeepSeek unveils V4-Pro, 1.6T parameter AI model with million-token context, hybrid attention architecture using 90% less memory than V3, trained on 32T tokens.

deepseek-ai/DeepSeek-V4-Pro · Hugging Face

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence Technical Report👁️ Introduction We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: Hybrid Attention Archi...