Type something to search...
How to Build Your Own OCR Assistant with Streamlit and Llama 3.2-Vision

How to Build Your Own OCR Assistant with Streamlit and Llama 3.2-Vision

Learn with example

OCR (Optical Character Recognition) is a tool that helps automate the process of converting images into text. You must have used it in your phone as it is very common now. From digitizing documents to automating business workflows, OCR is at the heart of many modern solutions. In this guide, we’ll walk you through creating a simple but powerful OCR assistant using Streamlit, Llama 3.2-Vision, and Ollama because why not to participate in the race of machine learning models. Fun part is not just you get text out of image but can also summarize it or modify prompt to get whatever you want from the model.

By the end, you’ll have a functional OCR tool that you can use to analyze images for visible text — plus, you’ll gain an understanding of cutting-edge technologies that are reshaping machine learning.

What is OCR and Why Use Llama 3.2-Vision?

What is OCR?

OCR is the technology that converts different types of documents — scanned paper documents, photos of documents, or images containing text — into editable and searchable data. Here’s why it matters:

  • Automate Data Entry: Extract text from scanned forms or invoices.
  • Digitize Records: Convert old books or papers into digital files.
  • Searchable Documents: Make image-based PDFs searchable and easily navigable.

Why Choose Llama 3.2-Vision for OCR?

Llama 3.2-Vision is a sophisticated vision model that offers:

  • High Accuracy: Especially with complex images or documents.
  • Advanced Formatting: It can maintain text structure and formatting better than traditional OCR models.
  • Adaptability: Integrates seamlessly with a local server setup for efficient image processing.

Step-by-Step Guide to Building Your OCR Assistant

First, ensure you clone the repository: https://github.com/MinimalDevops/llama-ocr.git

git clone https://github.com/MinimalDevops/llama-ocr.git
cd llama-ocr

1. Install Ollama and Llama 3.2-Vision

To use Llama 3.2-Vision, we need Ollama, a local service for running machine learning models.

Install Ollama:

curl -sSfL https://ollama.com/download | sh

Install Llama 3.2-Vision:

ollama pull llama3.2-vision

This command pulls the Llama 3.2-Vision model, making it accessible to your server.

Note: All these models require good Memory and CPU. If GPU is there, its icing on the cake.

2. Set Up Your Development Environment

Using a virtual environment helps avoid conflicts between Python packages.

Create a Virtual Environment:

python -m venv venv
source venv/bin/activate 

Activate the Environment:

  • Windows: venv\Scripts\activate
  • macOS/Linux: source venv/bin/activate

3. Install Dependencies

To keep things simple, use a requirements.txt file to install all necessary packages:

Install Dependencies:

pip install -r requirements.txt

The requirements include:

  • streamlit for the web interface
  • requests for making HTTP requests
  • Pillow for image handling

4. Run the Ollama Server

To use Llama 3.2-Vision for OCR, you need to start the Ollama server:

ollama serve

Check if model is running:

ollama ps

if not, then run it:

ollama run llama3.2-vision

This starts the server locally, making it available for processing requests at http://localhost:11434.

5. Run the Streamlit OCR Application

Now that everything is set up, it’s time to run the Streamlit app that will serve as your OCR interface:

Launch the App:

streamlit run ocr_app.py

Using the Interface:

  • Upload an image (JPG, JPEG, or PNG).

  • Click the “Run OCR” button to extract the text.

Note*: I am running 11B parameter model.*

Real-World Applications

  • Digitizing Old Records: Scan handwritten notes or books.
  • Automating Data Collection: Extract data from receipts or documents to streamline workflows.

Troubleshooting Common Issues

1. Server Connection Issues

  • 404 Error: Make sure the Ollama server is running before you attempt to use the OCR functionality.
  • Cannot Connect: Check if the endpoint http://localhost:11434 is reachable. Ensure there are no firewall or network issues.

2. Dependency Problems

  • Missing Packages: Always activate your virtual environment and use pip install -r requirements.txt to install dependencies.
  • Version Conflicts: Ensure that the Python version is 3.8 or higher to avoid compatibility issues.

Congratulations! You’ve built your own OCR assistant using Streamlit and Llama 3.2-Vision. Here’s what you achieved:

  • Installed and set up Ollama and Llama 3.2-Vision.
  • Created a virtual environment and installed all necessary packages.
  • Built a functional OCR tool to analyze text in images.

This is just the beginning! You can further improve the app by:

  • Adding More Models: Experiment with other OCR models.
  • Deploying It on the Cloud: Make it accessible over the internet for broader usage.
  • Modify Prompt to do wonders: Modify the prompt to get summarization as per your need, get more details on the given text in image and what not.

Explanation of code line by line is given in the readme.

Alternatively if you dont like coding and play around, you can use LM Studio.

  • Load a model such as “Llava Phi 3 mini”

  • Upload a image in chat and use chat prompt to get same information

Also, if you like to code, we can use LM Studio API to get same results from Llava Phi in next blog! That is a Must Read!

We’d love to hear about your experience and any customizations you make — don’t hesitate to share!

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

10 Creative Ways to Use ChatGPT Search The Web Feature

For example, prompts and outputs Did you know you can use the “search the web” feature of ChatGPT for many tasks other than your basic web search? For those who don't know, ChatGPT’s new

Read More
📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

In an industry as dynamic as AI and tech, staying ahead means constantly upgrading your skills. Whether you’re aiming to dive deep into AI model performance, master data analysis, or transform trad

Read More
10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

In today’s fast-paced digital world, marketers are always looking for smarter ways to streamline their efforts. Imagine having a personal assistant who can create audience profiles, suggest mar

Read More
10+ Top ChatGPT Prompts for UI/UX Designers

10+ Top ChatGPT Prompts for UI/UX Designers

AI technologies, such as machine learning, natural language processing, and data analytics, are redefining traditional design methodologies. From automating repetitive tasks to enabling personal

Read More
100 AI Tools to Finish Months of Work in Minutes

100 AI Tools to Finish Months of Work in Minutes

The rapid advancements in artificial intelligence (AI) have transformed how businesses operate, allowing people to complete tasks that once took weeks or months in mere minutes. From content creat

Read More
17 Mindblowing GitHub Repositories You Never Knew Existed

17 Mindblowing GitHub Repositories You Never Knew Existed

Github Hidden Gems!! Repositories To Bookmark Right Away Learning to code is relatively easy, but mastering the art of writing better code is much tougher. GitHub serves as a treasur

Read More