Type something to search...

Multimodal understanding

Gemini 2.0 Flash offers a significantly faster time to first token (TTFT) compared to Gemini 1.5 Flash, while maintaining quality on par with larger models like [Gemini 1.5 ...

Google: Gemini 2.0 Flash Experimental
Google
976.56K context $0.2/M input tokens $0.6/M output tokens

Pixtral Large is a 124B open-weights multimodal model built on top of Mistral Large 2. The model is able to understand documents, charts and natural images. The mode ...

Mistral: Pixtral Large 2411
MistralAI
125K context $2/M input tokens $6/M output tokens $0.003/M image tokens

Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance o...

Qwen2-VL 7B Instruct
Qwen
32K context $0.1/M input tokens $0.1/M output tokens $0.144/K image tokens