DeepSeek: R1 Distill Qwen 32B

128K Context
0.12/M Input Tokens
0.18/M Output Tokens

DeepSeek
Text 2 text
07 Feb, 2025

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI’s o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

Other benchmark results include:

AIME 2024 pass@1: 72.6
MATH-500 pass@1: 94.3
CodeForces Rating: 1691

The model leverages fine-tuning from DeepSeek R1’s outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek: DeepSeek V3 0324

Text 2 text

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 m ...

DeepSeek 62.5K context $0.27/M input tokens $1.1/M output tokens

FREE

DeepSeek: DeepSeek V3 0324 (free)

Text 2 text

# Free

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 m ...

DeepSeek 62.5K context $0 input tokens $0 output tokens

FREE

DeepSeek: DeepSeek V3.1 (free)

Text 2 text

# Free

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase ...

DeepSeek 159.96K context $0 input tokens $0 output tokens

DeepSeek V3

Text 2 text

# New # Hot

1. Introduction We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-eff ...

DeepSeek 62.5K context $0.14/M input tokens $0.28/M output tokens

DeepSeek V3

Text 2 text

# New # Hot

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ...

DeepSeek 62.5K context $0.14/M input tokens $0.28/M output tokens

FREE

DeepSeek: R1 0528 (free)

Text 2 text

# Free

DeepSeek-R1 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (R ...

DeepSeek 160K context $0 input tokens $0 output tokens

DeepSeek: R1 Distill Qwen 32B

Tags :

Share :

Related Posts

DeepSeek: DeepSeek V3 0324

DeepSeek: DeepSeek V3 0324 (free)

DeepSeek: DeepSeek V3.1 (free)

DeepSeek V3

DeepSeek V3

DeepSeek: R1 0528 (free)