DeepSeek V3

DeepSeek V3

62.5K Context
0.14/M Input Tokens
0.28/M Output Tokens

DeepSeek
Text 2 text
27 Dec, 2024

模型已不可用

DeepSeek-V3 是 DeepSeek 团队最新的模型，基于之前版本的指令跟随和编码能力。该模型在近 15 万亿个标记上进行预训练，报告的评估显示该模型在性能上优于其他开源模型，并与领先的闭源模型相媲美。有关模型的详细信息，请访问 DeepSeek-V3 仓库以获取更多信息。

DeepSeek-V2 Chat 是 DeepSeek-V2 的对话微调版本，属于混合专家（MoE）语言模型。它总共有 236B 个参数，其中每个标记激活 21B 个。

与 DeepSeek 67B 相比，DeepSeek-V2 实现了更强的性能，同时节省了 42.5% 的训练成本，减少了 93.3% 的 KV 缓存，并将最大生成吞吐量提升至 5.76 倍。

DeepSeek-V2 在标准基准测试和开放式生成评估中表现出色。

Tags :

Share :

Related Posts

DeepSeek V3

1. 介绍我们推出了 DeepSeek-V3，这是一款强大的混合专家 (MoE) 语言模型，拥有 671B 的总参数，其中每个令牌激活 37B。为了实现高效推理和具有成本效益的训练，DeepSeek-V3 采用了多头潜在注意力 (MLA) 和 DeepSeekMoE 架构，这些架构在 DeepSeek-V2 中得到了充分验证。此外，DeepSeek-V3 首创了一种无辅助损失的 ...

DeepSeek 62.5K context $0.14/M input tokens $0.28/M output tokens

DeepSeek: DeepSeek R1 Distill Llama 70B

DeepSeek R1 Distill Llama 70B 是一个基于 Llama-3.3-70B-Instruct 的蒸馏大型语言模型，使用了 DeepSeek R1 的输出。该模型结合了先进的蒸馏技术，以在多个基准测试中实现高性能，包括：AIME 2024 p...

DeepSeek 128K context $0.23/M input tokens $0.69/M output tokens

FREE

DeepSeek: R1 Distill Llama 70B (free)

DeepSeek R1 Distill Llama 70B 是一个基于 Llama-3.3-70B-Instruct 的蒸馏大型语言模型，使用了 DeepSeek R1 的输出。该模型结合了先进的蒸馏技术，以在多个基准测试中实现高性能，包括：AIME 2024 p...

DeepSeek 128K context $0 input tokens $0 output tokens

DeepSeek: R1 Distill Llama 8B

DeepSeek R1 Distill Llama 8B 是一个基于 Llama-3.1-8B-Instruct 的蒸馏大型语言模型，使用来自 DeepSeek R1 的输出。该模型结合了先进的蒸馏技术，在多个基准测试中实现了高性能，包括：AIME 2024 pas...

DeepSeek 31.25K context $0.04/M input tokens $0.04/M output tokens

DeepSeek: R1 Distill Qwen 1.5B

DeepSeek R1 Distill Qwen 1.5B 是一个基于 Qwen 2.5 Math 1.5B 的蒸馏大型语言模型，使用来自 DeepSeek R1 的输出。它是一个非常小且高效的模型，在数学基准测试中超越了 [GPT 4o 0513] ...

DeepSeek 128K context $0.18/M input tokens $0.18/M output tokens

DeepSeek: R1 Distill Qwen 14B

DeepSeek R1 Distill Qwen 14B 是一个基于 Qwen 2.5 14B 的蒸馏大型语言模型，使用来自 DeepSeek R1 的输出。它在各种基准测试中超越了 OpenAI 的 o1-min ...

DeepSeek 62.5K context $0.15/M input tokens $0.15/M output tokens