DeepSeek V2.5
- 125K Context
- 0.14/M Input Tokens
- 0.28/M Output Tokens
- Deepseek
- Text 2 text
- 14 May, 2024
Model Unavailable
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions.
DeepSeek-V2 Chat is a conversational finetune of DeepSeek-V2, a Mixture-of-Experts (MoE) language model. It comprises 236B total parameters, of which 21B are activated for each token.
Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.
DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluations.