Type something to search...

Vision

How to Build Your Own OCR Assistant with Streamlit and Llama 3.2-Vision

How to Build Your Own OCR Assistant with Streamlit and Llama 3.2-Vision

Learn with example OCR (Optical Character Recognition) is a tool that helps automate the process of converting images into text. You must have used it in your phone as it is very common no

Read More
Multilingual Vision Captioning: A Multi-Model Multimodal Approach to Image and Video Captioning and…

Multilingual Vision Captioning: A Multi-Model Multimodal Approach to Image and Video Captioning and…

Using a combination of Meta’s Llama 3.2 11B Vision Instruct, Facebook’s 600M NLLB-200, and LLaVA-Next-Video 7B models to produce multilingual image and video captions, descriptive tags, a

Read More
Qwen2-VL: A Vision Language Model That Runs Locally

Qwen2-VL: A Vision Language Model That Runs Locally

This is an introduction to「Qwen2-VL」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using [ailia SDK](h

Read More
Generating structured data from an image with GPT vision and Langchain

Generating structured data from an image with GPT vision and Langchain

In today’s world, where visual data is abundant, the ability to extract meaningful information from images is becoming increasingly valuable. Langchain, a powerful framework for building applica

Read More