Type something to search...
DeepSeek: R1 Distill Qwen 14B

DeepSeek: R1 Distill Qwen 14B

  • 62.5K Context
  • 0.15/M Input Tokens
  • 0.15/M Output Tokens

DeepSeek R1 Distill Qwen 14B is a distilled large language model based on Qwen 2.5 14B, using outputs from DeepSeek R1. It outperforms OpenAI’s o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

Other benchmark results include:

  • AIME 2024 pass@1: 69.7
  • MATH-500 pass@1: 93.9
  • CodeForces Rating: 1481

The model leverages fine-tuning from DeepSeek R1’s outputs, enabling competitive performance comparable to larger frontier models.

Related Posts

1. Introduction We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-eff ...

DeepSeek V3
DeepSeek
62.5K context $0.14/M input tokens $0.28/M output tokens

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ...

DeepSeek V3
DeepSeek
62.5K context $0.14/M input tokens $0.28/M output tokens

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The m ...

DeepSeek: DeepSeek R1 Distill Llama 70B
DeepSeek
128K context $0.23/M input tokens $0.69/M output tokens
FREE

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The m ...

DeepSeek: R1 Distill Llama 70B (free)
DeepSeek
128K context $0 input tokens $0 output tokens

DeepSeek R1 Distill Llama 8B is a distilled large language model based on Llama-3.1-8B-Instruct, using outputs from DeepSeek R1. The mode ...

DeepSeek: R1 Distill Llama 8B
DeepSeek
31.25K context $0.04/M input tokens $0.04/M output tokens

DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on Qwen 2.5 Math 1.5B, using outputs from [DeepSeek R1](/deepseek/deepseek-r1 ...

DeepSeek: R1 Distill Qwen 1.5B
DeepSeek
128K context $0.18/M input tokens $0.18/M output tokens
Type something to search...