Type something to search...

Ocr

Mini MiniCPM-o 2.6: The 8B Parameter Multimodal LLM Beating GPT-4o

Mini MiniCPM-o 2.6: The 8B Parameter Multimodal LLM Beating GPT-4o

In a groundbreaking development, Mini CPM-o has taken the world of multimodal large language models (LLMs) by storm. With its 8-billion parameter architecture, it not only outperforms GPT-4o on

Read More
Microsoft Open Sources MarkItDown: A Game-Changing Library for File-to-Text Conversion 🌐📊📚

Microsoft Open Sources MarkItDown: A Game-Changing Library for File-to-Text Conversion 🌐📊📚

A powerful, open-source tool that simplifies file processing and automates content extraction across PDFs, Word docs, images, audio and more. 📏🎓📦Professionals often face challenges

Read More
Extract any Document with Gemini 2.0 | Document Intelligence with ExtractThinker

Extract any Document with Gemini 2.0 | Document Intelligence with ExtractThinker

In this article, we’ll explore how Google’s Gemini 2.0 models supercharge Intelligent Document Proce

Read More
How to Build Your Own OCR Assistant with Streamlit and Llama 3.2-Vision

How to Build Your Own OCR Assistant with Streamlit and Llama 3.2-Vision

Learn with example OCR (Optical Character Recognition) is a tool that helps automate the process of converting images into text. You must have used it in your phone as it is very common no

Read More
From Prescription to Voice: A Python Solution to Help Service Elderly and Visually Impaired…

From Prescription to Voice: A Python Solution to Help Service Elderly and Visually Impaired…

Learn how to build a Fast API backend solution that combines OCR, Computer Vision and Google’s Text-To-Speech to read a prescription label Under normal circumstances, reading the label on yo

Read More
Reading Document and Coding Functions with Multi-Agent AI Systems using Magentic-One

Reading Document and Coding Functions with Multi-Agent AI Systems using Magentic-One

Magentic-One is designed to streamline complex tasks by leveraging multiple AI agents, each with specialized capabilities. [One of my previous post](https://readmedium.com/exploring-multi-agent

Read More
AI-Powered OCR with Phi-3-Vision-128K: The Future of Document Processing

AI-Powered OCR with Phi-3-Vision-128K: The Future of Document Processing

In the fast-evolving world of artificial intelligence, multimodal models are setting new standards for integrating visual and textual data. One of the latest breakthroughs is the **Phi-3-Visi

Read More