Multimodal processing

Amazon: Nova Lite 1.0

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time cu ...

Amazon 292.97K context $0.06/M input tokens $0.24/M output tokens

Google: Gemini Flash 1.5

Text image 2 text

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and vide ...

Google 976.56K context $0.075/M input tokens $0.3/M output tokens $0.04/K image tokens