Low latency

Cohere: Command R+

command-r-plus-08-2024 是 Command R+ 的更新，与之前的 Command R+ 版本相比，吞吐量提高了大约 50%，延迟降低了 25%，同时硬件占用保持不变。在此处阅读发布帖子 here。 ...

Cohere 125K context $2.85/M input tokens $14.25/M output tokens

Gemini 1.5 Flash-8B 针对速度和效率进行了优化，在聊天、转录和翻译等小提示任务中提供了增强的性能。由于延迟降低，它在实时和大规模操作中非常有效。该模型专注于具有成本效益的解决方案，同时保持高质量的结果。 [点击这里了解更多关于该模型的信息](https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-gener ...

Google 976.56K context $0.037/M input tokens $0.15/M output tokens

Low latency

Cohere: Command R+

Google: Gemini 1.5 Flash-8B