Type something to search...

Text image 2 text

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can ha ...

MiniMax: MiniMax-01
Rifx.Online
976.75K context $0.2/M input tokens $1.1/M output tokens

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using ...

OpenAI: o1
OpenAI
195.31K context $15/M input tokens $60/M output tokens $0.022/M image tokens
FREE

Gemini 2.0 Flash Thinking Mode is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stro ...

Google: Gemini 2.0 Flash Thinking Experimental (free)
Google
39.06K context $0 input tokens $0 output tokens

Grok 2 Vision 1212 advances image-based AI with stronger visual comprehension, refined instruction-following, and multilingual support. From object recognition to style analysis, it empowers develope ...

xAI: Grok 2 Vision 1212
X AI
32K context $2/M input tokens $10/M output tokens $0.004/M image tokens
70% OFF

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time c ...

nova-lite
Amazon
292.97K context $0.06/M input tokens $0.24/M output tokens
70% OFF

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the ...

nova-pro
Amazon
292.97K context $0.8/M input tokens $3.2/M output tokens $0.001/M image tokens

Experimental release (December 6, 2024) of Gemini. ...

gemini-exp-1206
Google
8K context $4/M input tokens $16/M output tokens
40% OFF

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including:Code generation Text generation Text editing Problem solving...

Gemini 1.5 Pro
Google
1.91M context $2.5/M input tokens $10/M output tokens $0.003/M image tokens

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the- ...

Amazon: Nova Pro 1.0
Amazon
292.97K context $0.8/M input tokens $3.2/M output tokens $0.001/M image tokens

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time cu ...

Amazon: Nova Lite 1.0
Amazon
292.97K context $0.06/M input tokens $0.24/M output tokens
40% OFF

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https ...

Claude-3-Haiku-20240307
Anthropic
195.31K context $0.5/M input tokens $2.5/M output tokens $0.4/K image tokens
40% OFF

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and vid ...

Gemini Flash 1.5
Google
976.56K context $0.15/M input tokens $0.6/M output tokens $0.04/K image tokens

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more afford ...

GPT-4o mini
OpenAI
125K context $0.15/M input tokens $0.6/M output tokens $0.007/M image tokens
40% OFF

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twi ...

gpt-4o
OpenAI
125K context $2.5/M input tokens $10/M output tokens $0.004/M image tokens
40% OFF

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more afford ...

GPT-4o mini
OpenAI
125K context $0.15/M input tokens $0.6/M output tokens $0.007/M image tokens
40% OFF

Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:Coding: Autonomously writes, edits, and runs code w...

Claude 3.5 Sonnet-20240620
Anthropic
195.31K context $3/M input tokens $15/M output tokens $0.005/M image tokens

Pixtral Large is a 124B open-weights multimodal model built on top of Mistral Large 2. The model is able to understand documents, charts and natural images. The mode ...

Mistral: Pixtral Large 2411
MistralAI
125K context $2/M input tokens $6/M output tokens $0.003/M image tokens

Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response. See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.googl ...

Google: Gemini Pro Vision 1.0
Google
16K context $0.5/M input tokens $1.5/M output tokens $0.003/M image tokens

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including:Code generation Text generation Text editing Problem solving...

Google: Gemini Pro 1.5
Google
1.91M context $1.25/M input tokens $5/M output tokens $0.003/M image tokens

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and vide ...

Google: Gemini Flash 1.5
Google
976.56K context $0.075/M input tokens $0.3/M output tokens $0.04/K image tokens

The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x.com/mistralai/status/1833758285167722836 ...

Mistral: Pixtral 12B
MistralAI
4K context $0.1/M input tokens $0.1/M output tokens $0.144/K image tokens

Dynamic model continuously updated to the current version of GPT-4o in ChatGPT. Intended for research and evaluation. Note: This model is currently experimental and not suitable fo ...

OpenAI: ChatGPT-4o
OpenAI
125K context $5/M input tokens $15/M output tokens $0.007/M image tokens

Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:Coding: Autonomously writes, edits, and runs code wi...

Anthropic: Claude 3.5 Sonnet (2024-06-20)
Anthropic
195.31K context $3/M input tokens $15/M output tokens $0.005/M image tokens
FREE

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including:Code generation Text generation Text editing Problem solving...

Google: Gemini Pro 1.5 Experimental
Google
1.91M context $0 input tokens $0 output tokens $0.003/M image tokens

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark result ...

Anthropic: Claude 3 Opus
Anthropic
195.31K context $15/M input tokens $75/M output tokens $0.024/M image tokens

Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and ...

Anthropic: Claude 3 Sonnet
Anthropic
195.31K context $3/M input tokens $15/M output tokens $0.005/M image tokens

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https: ...

Anthropic: Claude 3 Haiku
Anthropic
195.31K context $0.25/M input tokens $1.25/M output tokens $0.4/K image tokens

Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:Coding: Autonomously writes, edits, and runs code wi...

Anthropic: Claude 3.5 Sonnet
Anthropic
195.31K context $3/M input tokens $15/M output tokens $0.005/M image tokens

Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance o...

Qwen2-VL 7B Instruct
Qwen
32K context $0.1/M input tokens $0.1/M output tokens $0.144/K image tokens
FREE

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answ ...

Meta: Llama 3.2 11B Vision Instruct (free)
Meta Llama
128K context $0 input tokens $0 output tokens $0.079/K image tokens

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answ ...

Meta: Llama 3.2 11B Vision Instruct
Meta Llama
128K context $0.055/M input tokens $0.055/M output tokens $0.079/K image tokens

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twi ...

GPT-4o
OpenAI
125K context $2.5/M input tokens $10/M output tokens $0.004/M image tokens

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of GPT-4 Turbo while being twi ...

OpenAI: GPT-4o
OpenAI
125K context $2.5/M input tokens $10/M output tokens $0.004/M image tokens

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more afford ...

OpenAI: GPT-4o-mini
OpenAI
125K context $0.15/M input tokens $0.6/M output tokens $0.007/M image tokens

Gemini 1.5 Flash-8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective ...

Google: Gemini 1.5 Flash-8B
Google
976.56K context $0.037/M input tokens $0.15/M output tokens

Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements:SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance...

Qwen2-VL 72B Instruct
Qwen
32K context $0.4/M input tokens $0.4/M output tokens $0.578/K image tokens

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image caption ...

Meta: Llama 3.2 90B Vision Instruct
Meta Llama
128K context $0.35/M input tokens $0.4/M output tokens $0.506/K image tokens

Experimental release (November 21st, 2024) of Gemini. ...

Google: Gemini Experimental 1121 (free)
Rifx.Online
8K context $0 input tokens $0 output tokens

An experimental version of Gemini 1.5 Pro from Google. ...

Google: LearnLM 1.5 Pro Experimental (free)
Rifx.Online
8K context $0 input tokens $0 output tokens

Gemini 1.5 Flash-8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is hig ...

Google: Gemini 1.5 Flash-8B
Google
976.56K context $0.037/M input tokens $0.15/M output tokens

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual ...

Meta: Llama 3.2 11B Vision Instruct
Meta llama
128K context $0.055/M input tokens $0.055/M output tokens $0.079/K image tokens

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in ...

Meta: Llama 3.2 90B Vision Instruct
Meta llama
128K context $0.35/M input tokens $0.4/M output tokens $0.506/K image tokens

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in ...

Meta: Llama 3.2 90B Vision Instruct (free)
Rifx.Online
4K context $0 input tokens $0 output tokens

Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements:SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-ar...

Qwen2-VL 72B Instruct
Qwen
32K context $0.4/M input tokens $0.4/M output tokens $0.578/K image tokens

The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x.com/mistralai/status/1833758285167722836 ...

Mistral: Pixtral 12B
Mistralai
4K context $0.1/M input tokens $0.1/M output tokens $0.144/K image tokens

Gemini 1.5 Flash 8B Experimental is an experimental, 8B parameter version of the Gemini 1.5 Flash model. Usage of Gemini is subject to Google's [Gemini Term ...

Google: Gemini Flash 8B 1.5 Experimental
Google
976.56K context $0 input tokens $0 output tokens

Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art...

Qwen2-VL 7B Instruct
Qwen
32K context $0.1/M input tokens $0.1/M output tokens $0.144/K image tokens

Dynamic model continuously updated to the current version of GPT-4o in ChatGPT. Intended for research and evaluation. Note: This model is currently experimental and n ...

OpenAI: ChatGPT-4o
Openai
125K context $5/M input tokens $15/M output tokens $0.007/M image tokens

Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:Coding: Autonomously writes, edits, an...

Anthropic: Claude 3.5 Sonnet (2024-06-20)
Anthropic
195.31K context $3/M input tokens $15/M output tokens $0.005/M image tokens

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, ...

Google: Gemini Flash 1.5
Google
976.56K context $0.075/M input tokens $0.3/M output tokens $0.04/K image tokens

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including:Code generation Text generation Text editing Prob...

Google: Gemini Pro 1.5
Google
1.91M context $1.25/M input tokens $5/M output tokens $0.003/M image tokens

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results ...

Anthropic: Claude 3 Haiku
Anthropic
195.31K context $0.25/M input tokens $1.25/M output tokens $0.4/K image tokens

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and be ...

Anthropic: Claude 3 Opus
Anthropic
195.31K context $15/M input tokens $75/M output tokens $0.024/M image tokens

Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch an ...

Anthropic: Claude 3 Sonnet
Anthropic
195.31K context $3/M input tokens $15/M output tokens $0.005/M image tokens

Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response. See the benchmarks and prompting guidelines from [Deepmind](https:// ...

Google: Gemini Pro Vision 1.0
Google
16K context $0.5/M input tokens $1.5/M output tokens $0.003/M image tokens