DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase ...

DeepSeek 159.96K context $0 input tokens $0 output tokens

DeepSeek V3

Text 2 text

# New # Hot

1. 介绍我们推出了 DeepSeek-V3，这是一款强大的混合专家 (MoE) 语言模型，拥有 671B 的总参数，其中每个令牌激活 37B。为了实现高效推理和具有成本效益的训练，DeepSeek-V3 采用了多头潜在注意力 (MLA) 和 DeepSeekMoE 架构，这些架构在 DeepSeek-V2 中得到了充分验证。此外，DeepSeek-V3 首创了一种无辅助损失的 ...

DeepSeek 62.5K context $0.14/M input tokens $0.28/M output tokens

DeepSeek V3

Text 2 text

# New # Hot

DeepSeek-V3 是 DeepSeek 团队最新的模型，基于之前版本的指令跟随和编码能力。该模型在近 15 万亿个标记上进行预训练，报告的评估显示该模型在性能上优于其他开源模型，并与领先的闭源模型相媲美。有关模型的详细信息，请访问 DeepSeek-V3 仓库以获取更多信息。 DeepSeek-V2 Chat 是 DeepSeek-V2 的对话微调版本，属于混合专家（MoE）语言模型。 ...

DeepSeek 62.5K context $0.14/M input tokens $0.28/M output tokens

FREE

DeepSeek: R1 0528 (free)

Text 2 text

# Free

DeepSeek-R1 1. 介绍我们介绍我们的第一代推理模型，DeepSeek-R1-Zero 和 DeepSeek-R1。 DeepSeek-R1-Zero 是通过大规模强化学习（RL）训练的模型，没有经过监督微调（SFT）作为初步步骤，表现出卓越的推理能力。通过 RL，DeepSeek-R1-Zero 自然展现出许多强大且有趣的推理行为。然而，DeepSeek-R ...

DeepSeek 160K context $0 input tokens $0 output tokens

DeepSeek: R1 Distill Qwen 14B

Tags :

Share :

Related Posts

DeepSeek: DeepSeek V3 0324

DeepSeek: DeepSeek V3 0324 (free)

DeepSeek: DeepSeek V3.1 (free)

DeepSeek V3

DeepSeek V3

DeepSeek: R1 0528 (free)