Type something to search...
Google Video Analyzer: Gemini 2.0 | by Samar Singh | Dec, 2024 | Medium

Google Video Analyzer: Gemini 2.0 | by Samar Singh | Dec, 2024 | Medium

The advancements in AI tools are moving at breakneck speed, and Google AI Studio’s Video Analyzer is a testament to this innovation. If you’re curious about video analysis, this tool and its underlying framework are excellent ways to explore the capabilities of AI in processing and understanding video content. I have covered indepth about gemini 2.0 model as well as Google AI Studio in my previous article.

In this article, we’ll explore the Video Analyzer app on AI Studio, walk through its key features, and demonstrate how to replicate its functionality using Python code in Google Colab. Whether you’re a developer or an AI enthusiast, this comprehensive guide will help you leverage this groundbreaking technology.

What is Google AI Studio’s Video Analyzer?

Google AI Studio’s Video Analyzer is a robust application designed to analyze video content efficiently. By utilizing advanced AI techniques, it provides:

  1. Scene-based Captions: Automatically generates captions for each scene, including visual descriptions and spoken text.
  2. Key Moment Extraction: Identifies pivotal moments in the video and summarizes them concisely.
  3. Object and Count Analysis: Detects objects, people, or other numerical entities across scenes.
  4. Creative Outputs: Produces creative outputs like haikus based on video content.

This app combines powerful prompting with function calls to process and analyze videos dynamically.

Demo Walkthrough: Exploring Video Analyzer on AI Studio

Here’s how you can utilize the app step by step:

1. Upload a Video

  • Start by uploading your video to AI Studio.

2. Generate A/V Captions & Paragraph

3. Summarize Key Moments

  • The app highlights important scenes, creating a concise timeline. For example:
  • 00:18: Introduction of Gemini.
  • 02:00: Summary of Gemini’s features.

4. Create Tabular Data

Tabular outputs allow you to visualize:

  • Timings.
  • Scene descriptions.
  • Additional objects or emojis tied to scenes.

5. Chart & Custom

  • Count the number of object like people, phones or trees in each scene.

Video Analyzer with Python in google colab

This step-by-step guide demonstrates how to use Python to interact with the API, upload a video, and generate accurate scene captions with timecodes.

Prerequisites

Before getting started, ensure the following:

  • Google API Key: Obtain an API key from the Google Developer Console.
  • Google Gemini 2.0 SDK: Install the library using pip.
  • Video File: Prepare the video file you want to process.

Step 1: Install the Required Library

Install the Google Gemini SDK by running the following command in your environment:

!pip install -U -q google-genai

Step 2: Authenticate with Google API

The Google API key is necessary to authenticate your requests. In this example, we’re using Google Colab’s userdata for secure storage.

import os
from google.colab import userdata
from google import genai
from google.genai import types
## Fetch the API key securely
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
## Initialize the client
client = genai.Client(api_key=GOOGLE_API_KEY)

Step 3: Define the Model and Upload the Video

We’re using the gemini-2.0-flash-exp model for content generation. First, prepare and upload your video file.

import pathlib
## Path to your video file
img_path = pathlib.Path('/content/Introducing Gemini 2.0 烈 Our most capable AI model yet.mp4')
## Upload the video file
file_upload = client.files.upload(path=img_path)
## Monitor upload state
import time
while file_upload.state == "PROCESSING":
    print('Waiting for video to be processed...')
    time.sleep(10)
    file_upload = client.files.get(name=file_upload.name)
if file_upload.state == "FAILED":
    raise ValueError("Video processing failed")
print(f'Video processing complete: {file_upload.uri}')

Step 4: Define the Prompts

Define the system prompt and user prompt to instruct the model to generate captions.

SYSTEM_PROMPT = "When given a video and a query, call the relevant function only once with the appropriate timecodes and text for the video"

USER_PROMPT = """For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. 
    Place each caption into an object sent to set_timecodes with the timecode of the caption in the video."""

Step 5: Generate Content Using the Model

Send the uploaded video and the prompts to the Gemini 2.0 model for processing.

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents=[
        types.Content(
            role="user",
            parts=[
                types.Part.from_uri(
                    file_uri=file_upload.uri,
                    mime_type=file_upload.mime_type
                )
            ]
        ),
        USER_PROMPT,
    ],
    config=types.GenerateContentConfig(
        system_instruction=SYSTEM_PROMPT,
        temperature=0.0,
    ),
)

Step 6: Display the Results

The API’s response contains the captions with timecodes. Use the Markdown library to display the results neatly.

from IPython.display import Markdown
## Render the captions as markdown
Markdown(response.text)

Applications of Video Analyzer

  1. Content Creation: Automate video summarization for blogs or reports.
  2. Accessibility: Generate captions for improved accessibility.
  3. Event Analysis: Highlight key moments in sports or presentations.
  4. Creative Outputs: Leverage creative interpretations like poetry for marketing.

Conclusion

Google AI Studio’s Video Analyzer is an excellent tool for video analysis, offering insights through captions, summaries, and object detection. By understanding its underlying principles and recreating it with Python, you can harness the power of AI to analyze and interpret video content effectively. Whether you’re building accessibility features, summarizing content, or exploring creative possibilities, the Video Analyzer provides a strong foundation to innovate.

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

10 Creative Ways to Use ChatGPT Search The Web Feature

For example, prompts and outputs Did you know you can use the “search the web” feature of ChatGPT for many tasks other than your basic web search? For those who don't know, ChatGPT’s new

Read More
📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

In an industry as dynamic as AI and tech, staying ahead means constantly upgrading your skills. Whether you’re aiming to dive deep into AI model performance, master data analysis, or transform trad

Read More
10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

In today’s fast-paced digital world, marketers are always looking for smarter ways to streamline their efforts. Imagine having a personal assistant who can create audience profiles, suggest mar

Read More
10+ Top ChatGPT Prompts for UI/UX Designers

10+ Top ChatGPT Prompts for UI/UX Designers

AI technologies, such as machine learning, natural language processing, and data analytics, are redefining traditional design methodologies. From automating repetitive tasks to enabling personal

Read More
100 AI Tools to Finish Months of Work in Minutes

100 AI Tools to Finish Months of Work in Minutes

The rapid advancements in artificial intelligence (AI) have transformed how businesses operate, allowing people to complete tasks that once took weeks or months in mere minutes. From content creat

Read More
17 Mindblowing GitHub Repositories You Never Knew Existed

17 Mindblowing GitHub Repositories You Never Knew Existed

Github Hidden Gems!! Repositories To Bookmark Right Away Learning to code is relatively easy, but mastering the art of writing better code is much tougher. GitHub serves as a treasur

Read More