Type something to search...
DeepSeek: DeepSeek V3.2 Exp

DeepSeek: DeepSeek V3.2 Exp

  • 160K Context
  • 0.27/M Input Tokens
  • 0.4/M Output Tokens

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs.

Tags :
Share :

Related Posts

DeepSeek V3,一个拥有685B参数的混合专家模型,是DeepSeek团队旗舰聊天模型系列的最新版本。 它继承了DeepSeek V3模型,并在多种任务上表现出色。 ...

DeepSeek: DeepSeek V3 0324
DeepSeek
62.5K context $0.27/M input tokens $1.1/M output tokens
FREE

DeepSeek V3,一个拥有685B参数的混合专家模型,是DeepSeek团队旗舰聊天模型系列的最新版本。 它继承了DeepSeek V3模型,并在多种任务上表现出色。 ...

DeepSeek: DeepSeek V3 0324 (free)
DeepSeek
62.5K context $0 input tokens $0 output tokens
FREE

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase ...

DeepSeek: DeepSeek V3.1 (free)
DeepSeek
159.96K context $0 input tokens $0 output tokens

1. 介绍 我们推出了 DeepSeek-V3,这是一款强大的混合专家 (MoE) 语言模型,拥有 671B 的总参数,其中每个令牌激活 37B。 为了实现高效推理和具有成本效益的训练,DeepSeek-V3 采用了多头潜在注意力 (MLA) 和 DeepSeekMoE 架构,这些架构在 DeepSeek-V2 中得到了充分验证。 此外,DeepSeek-V3 首创了一种无辅助损失的 ...

DeepSeek V3
DeepSeek
62.5K context $0.14/M input tokens $0.28/M output tokens

DeepSeek-V3 是 DeepSeek 团队最新的模型,基于之前版本的指令跟随和编码能力。该模型在近 15 万亿个标记上进行预训练,报告的评估显示该模型在性能上优于其他开源模型,并与领先的闭源模型相媲美。有关模型的详细信息,请访问 DeepSeek-V3 仓库以获取更多信息。 DeepSeek-V2 Chat 是 DeepSeek-V2 的对话微调版本,属于混合专家(MoE)语言模型。 ...

DeepSeek V3
DeepSeek
62.5K context $0.14/M input tokens $0.28/M output tokens
FREE

DeepSeek-R1 1. 介绍 我们介绍我们的第一代推理模型,DeepSeek-R1-Zero 和 DeepSeek-R1。 DeepSeek-R1-Zero 是通过大规模强化学习(RL)训练的模型,没有经过监督微调(SFT)作为初步步骤,表现出卓越的推理能力。 通过 RL,DeepSeek-R1-Zero 自然展现出许多强大且有趣的推理行为。 然而,DeepSeek-R ...

DeepSeek: R1 0528 (free)
DeepSeek
160K context $0 input tokens $0 output tokens
Type something to search...