Type something to search...

Multimodal

Mini MiniCPM-o 2.6: The 8B Parameter Multimodal LLM Beating GPT-4o

Mini MiniCPM-o 2.6: The 8B Parameter Multimodal LLM Beating GPT-4o

In a groundbreaking development, Mini CPM-o has taken the world of multimodal large language models (LLMs) by storm. With its 8-billion parameter architecture, it not only outperforms GPT-4o on

Read More
A Multimodal AI Assistant: Combining Local and Cloud Models

A Multimodal AI Assistant: Combining Local and Cloud Models

Use LangGraph, mlx and Florence2 to build an agent that answers complex image questions, with the option to run everything locally. *In this article we’ll use LangGraph in conjunction with

Read More
Image Inference through Multi-Modal LLM Models

Image Inference through Multi-Modal LLM Models

MULTIMODAL AI | LLM | OPENAI | GEMINI | VISION This blog explores the capabilities of multi-modal models in image inference, highlighting their ability to integrate visual and text

Read More
Qwen QVQ-72B: Best open-sourced Image Reasoning LLM

Qwen QVQ-72B: Best open-sourced Image Reasoning LLM

Visual Reasoning LLM by Alibaba So. before ending 2024, Qwen (by Alibaba) is back with a bang and has released another open-sourced LLM, Qwen QVQ-72B which is a visual reasoning LLM i.e.

Read More
The Rise and Evolution of RAG in 2024: A Year in Review

The Rise and Evolution of RAG in 2024: A Year in Review

As 2024 comes to a close, the development of Retrieval-Augmented Generation (RAG) has been nothing short of turbulent. Let’s take a comprehensive look back at the year’s progress from various per

Read More
Use Gemini 2.0 to Build a Realtime Chat App with Multimodal Live API

Use Gemini 2.0 to Build a Realtime Chat App with Multimodal Live API

Gemini Development Tutorial Google launched Gemini 2.0 with the preview model Gemini 2.0 Flash Experimental, and you must have learned about it from videos and articles. This model has g

Read More
Gemini 2.0 Flash + Local Multimodal RAG + Context-aware Python Project: Easy AI/Chat for your Docs

Gemini 2.0 Flash + Local Multimodal RAG + Context-aware Python Project: Easy AI/Chat for your Docs

In this video, I have a super quick tutorial showing you how to create a local Multimodal RAG, Gemini 2.0 Flash and Context-aware response to make a powerful agent chatbot for your business or

Read More
Multilingual Vision Captioning: A Multi-Model Multimodal Approach to Image and Video Captioning and…

Multilingual Vision Captioning: A Multi-Model Multimodal Approach to Image and Video Captioning and…

Using a combination of Meta’s Llama 3.2 11B Vision Instruct, Facebook’s 600M NLLB-200, and LLaVA-Next-Video 7B models to produce multilingual image and video captions, descriptive tags, a

Read More
Developer’s guide to getting started with Gemini 2.0 Flash on Vertex AI

Developer’s guide to getting started with Gemini 2.0 Flash on Vertex AI

Gemini 2.0 has arrived, bringing next-level capabilities built for this new agentic era. Gemini 2.0 Fl

Read More
OpenAI’s O1 Model: A Detailed Exploration into the Future of AI

OpenAI’s O1 Model: A Detailed Exploration into the Future of AI

IntroductionArtificial intelligence has rapidly evolved over the last decade, leading to breakthroughs in natural language processing (NLP), machine learning, and multimodal applications. Op

Read More
Smarter and Faster: OpenAI o1 and o1 pro mode

Smarter and Faster: OpenAI o1 and o1 pro mode

Just 12 hours ago, OpenAI rolled out the new o1 model and o1 with pro mode. As you may already know, o1 models are the first series of models designed to think before answering, providing more det

Read More
OpenAI o1 Model Fully Released: Enhanced Multimodal AI for Science, Coding, and Writing

OpenAI o1 Model Fully Released: Enhanced Multimodal AI for Science, Coding, and Writing

Discover OpenAI’s new o1 model: faster, smarter, and multimodal. With advanced reasoning, coding precision, and image analysis, o1 sets a new AI standard. OpenAI’s o1 Model Now Fully R

Read More
OpenAI’s O1 and O1 Pro Models: A New Era of Reasoning-Focused AI

OpenAI’s O1 and O1 Pro Models: A New Era of Reasoning-Focused AI

Artificial intelligence has made remarkable strides in recent years, with large language models evolving from simple text generators to powerful systems capable of tackling advanced reasoning ta

Read More

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https: ...

Anthropic: Claude 3 Haiku
Anthropic
195.31K context $0.25/M input tokens $1.25/M output tokens $0.4/K image tokens
Claude 3.5 vs GPT-4o: Key Differences You Need to Know

Claude 3.5 vs GPT-4o: Key Differences You Need to Know

Anthropic’s latest release, Claude 3.5 Sonnet, enters a market where OpenAI’s GPT-4o has set a high benchmark, with [92% of Fortune 500](https://www.techbusinessnews.com.au/news/92-of-fortune-

Read More
Multimodal AI for Conversational Human Motion

Multimodal AI for Conversational Human Motion

Written by Christian Safka and Keyu Chen In this exploration we’ll look at how multi

Read More
Introduction to LLaVA: A Multimodal AI Model

Introduction to LLaVA: A Multimodal AI Model

LLaVA is an end-to-end trained large multimodal model that is designed to understand and generate content based on both visual inputs (images) and textual instructions. It combines the capabil

Read More
Claude 3.5 Sonnet V/S GPT-4O: Which one is better

Claude 3.5 Sonnet V/S GPT-4O: Which one is better

In November 2022, OpenAI launched ChatGPT, a model that has revolutionized how we search and interact with information. Next year, in March, an American startup,” Anthropic,” founded by ex-OpenAI

Read More
Alibaba’s Open-Source Qwen: How It’s Revolutionizing AI and How You Can Use It

Alibaba’s Open-Source Qwen: How It’s Revolutionizing AI and How You Can Use It

Alibaba has recently made waves in the AI world by open-sourcing its Qwen 2.5 models during the 2024 Apsara Conference. With over 100 models, Qwen spans multiple modalities including language, vi

Read More
A new risings Red star: Qwen2.5 is here

A new risings Red star: Qwen2.5 is here

Let’s test together the new born Alibaba Cloud’s generative AI Qwen2.5 with python and llama-cpp In silence, with not so many claims and anticipated announcements, Alibaba Cloud release on

Read More
RBYF: Qwen2.5–3B-instruct is damn good.

RBYF: Qwen2.5–3B-instruct is damn good.

Revised Benchmark with You as a Feedback: the brand new 3B model from Alibaba Qwen is an amazing model, and I can prove it! The illusion of emergent properties is largely a product of the metr

Read More