ocr

Mini MiniCPM-o 2.6: The 8B Parameter Multimodal LLM Beating GPT-4o

Rifx.Online
Natural Language Processing , Machine Learning , Technology/Web
20 Jan, 2025

In a groundbreaking development, Mini CPM-o has taken the world of multimodal large language models (LLMs) by storm. With its 8-billion parameter architecture, it not only outperforms GPT-4o on

Microsoft Open Sources MarkItDown: A Game-Changing Library for File-to-Text Conversion 🌐📊📚

Rifx.Online
Technology , Programming , Machine Learning
30 Dec, 2024

A powerful, open-source tool that simplifies file processing and automates content extraction across PDFs, Word docs, images, audio and more. 📏🎓📦Professionals often face challenges

Extract any Document with Gemini 2.0 | Document Intelligence with ExtractThinker

In this article, we’ll explore how Google’s Gemini 2.0 models supercharge Intelligent Document Proce

How to Build Your Own OCR Assistant with Streamlit and Llama 3.2-Vision

Rifx.Online
Programming , Technology , Computer Vision
27 Dec, 2024

Learn with example OCR (Optical Character Recognition) is a tool that helps automate the process of converting images into text. You must have used it in your phone as it is very common no

From Prescription to Voice: A Python Solution to Help Service Elderly and Visually Impaired…

Learn how to build a Fast API backend solution that combines OCR, Computer Vision and Google’s Text-To-Speech to read a prescription label Under normal circumstances, reading the label on yo

Reading Document and Coding Functions with Multi-Agent AI Systems using Magentic-One

Rifx.Online
Programming , Technology , Machine Learning
26 Nov, 2024

Magentic-One is designed to streamline complex tasks by leveraging multiple AI agents, each with specialized capabilities. [One of my previous post](https://readmedium.com/exploring-multi-agent

AI-Powered OCR with Phi-3-Vision-128K: The Future of Document Processing

Rifx.Online
Natural Language Processing , Computer Vision , Data Science
08 Nov, 2024

In the fast-evolving world of artificial intelligence, multimodal models are setting new standards for integrating visual and textual data. One of the latest breakthroughs is the **Phi-3-Visi

Mini MiniCPM-o 2.6: The 8B Parameter Multimodal LLM Beating GPT-4o

Microsoft Open Sources MarkItDown: A Game-Changing Library for File-to-Text Conversion 🌐📊📚

Extract any Document with Gemini 2.0 | Document Intelligence with ExtractThinker

How to Build Your Own OCR Assistant with Streamlit and Llama 3.2-Vision

From Prescription to Voice: A Python Solution to Help Service Elderly and Visually Impaired…

Reading Document and Coding Functions with Multi-Agent AI Systems using Magentic-One

AI-Powered OCR with Phi-3-Vision-128K: The Future of Document Processing