DeepSeek V3

62.5K Context
0.14/M Input Tokens
0.28/M Output Tokens

DeepSeek
Text 2 text
27 Dec, 2024

Model Unavailable

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models. For model details, please visit the DeepSeek-V3 repo for more information.

DeepSeek-V2 Chat is a conversational finetune of DeepSeek-V2, a Mixture-of-Experts (MoE) language model. It comprises 236B total parameters, of which 21B are activated for each token.

Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.

DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluations.

DeepSeek V3

Text 2 text

# New # Hot

1. Introduction We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-eff ...

DeepSeek 62.5K context $0.14/M input tokens $0.28/M output tokens

DeepSeek: DeepSeek R1 Distill Llama 70B

Text 2 text

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The m ...

DeepSeek 128K context $0.23/M input tokens $0.69/M output tokens

FREE

DeepSeek: R1 Distill Llama 70B (free)

Text 2 text

# Free

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The m ...

DeepSeek 128K context $0 input tokens $0 output tokens

DeepSeek: R1 Distill Llama 8B

Text 2 text

DeepSeek R1 Distill Llama 8B is a distilled large language model based on Llama-3.1-8B-Instruct, using outputs from DeepSeek R1. The mode ...

DeepSeek 31.25K context $0.04/M input tokens $0.04/M output tokens

DeepSeek: R1 Distill Qwen 1.5B

Text 2 text

DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on Qwen 2.5 Math 1.5B, using outputs from [DeepSeek R1](/deepseek/deepseek-r1 ...

DeepSeek 128K context $0.18/M input tokens $0.18/M output tokens

DeepSeek: R1 Distill Qwen 14B

Text 2 text

DeepSeek R1 Distill Qwen 14B is a distilled large language model based on Qwen 2.5 14B, using outputs from [DeepSeek R1](/deepseek/d ...

DeepSeek 62.5K context $0.15/M input tokens $0.15/M output tokens

DeepSeek V3

Tags :

Share :

Related Posts

DeepSeek V3

DeepSeek: DeepSeek R1 Distill Llama 70B

DeepSeek: R1 Distill Llama 70B (free)

DeepSeek: R1 Distill Llama 8B

DeepSeek: R1 Distill Qwen 1.5B

DeepSeek: R1 Distill Qwen 14B