Google Video Analyzer: Gemini 2.0 | by Samar Singh | Dec, 2024 | Medium
- Rifx.Online
- Programming , Technology , Computer Vision
- 05 Jan, 2025
The advancements in AI tools are moving at breakneck speed, and Google AI Studio’s Video Analyzer is a testament to this innovation. If you’re curious about video analysis, this tool and its underlying framework are excellent ways to explore the capabilities of AI in processing and understanding video content. I have covered indepth about gemini 2.0 model as well as Google AI Studio in my previous article.
In this article, we’ll explore the Video Analyzer app on AI Studio, walk through its key features, and demonstrate how to replicate its functionality using Python code in Google Colab. Whether you’re a developer or an AI enthusiast, this comprehensive guide will help you leverage this groundbreaking technology.
What is Google AI Studio’s Video Analyzer?
Google AI Studio’s Video Analyzer is a robust application designed to analyze video content efficiently. By utilizing advanced AI techniques, it provides:
- Scene-based Captions: Automatically generates captions for each scene, including visual descriptions and spoken text.
- Key Moment Extraction: Identifies pivotal moments in the video and summarizes them concisely.
- Object and Count Analysis: Detects objects, people, or other numerical entities across scenes.
- Creative Outputs: Produces creative outputs like haikus based on video content.
This app combines powerful prompting with function calls to process and analyze videos dynamically.
Demo Walkthrough: Exploring Video Analyzer on AI Studio
Here’s how you can utilize the app step by step:
1. Upload a Video
- Start by uploading your video to AI Studio.
2. Generate A/V Captions & Paragraph
3. Summarize Key Moments
- The app highlights important scenes, creating a concise timeline. For example:
00:18
: Introduction of Gemini.02:00
: Summary of Gemini’s features.
4. Create Tabular Data
Tabular outputs allow you to visualize:
- Timings.
- Scene descriptions.
- Additional objects or emojis tied to scenes.
5. Chart & Custom
- Count the number of object like people, phones or trees in each scene.
Video Analyzer with Python in google colab
This step-by-step guide demonstrates how to use Python to interact with the API, upload a video, and generate accurate scene captions with timecodes.
Prerequisites
Before getting started, ensure the following:
- Google API Key: Obtain an API key from the Google Developer Console.
- Google Gemini 2.0 SDK: Install the library using pip.
- Video File: Prepare the video file you want to process.
Step 1: Install the Required Library
Install the Google Gemini SDK by running the following command in your environment:
!pip install -U -q google-genai
Step 2: Authenticate with Google API
The Google API key is necessary to authenticate your requests. In this example, we’re using Google Colab’s userdata
for secure storage.
import os
from google.colab import userdata
from google import genai
from google.genai import types
## Fetch the API key securely
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
## Initialize the client
client = genai.Client(api_key=GOOGLE_API_KEY)
Step 3: Define the Model and Upload the Video
We’re using the gemini-2.0-flash-exp
model for content generation. First, prepare and upload your video file.
import pathlib
## Path to your video file
img_path = pathlib.Path('/content/Introducing Gemini 2.0 烈 Our most capable AI model yet.mp4')
## Upload the video file
file_upload = client.files.upload(path=img_path)
## Monitor upload state
import time
while file_upload.state == "PROCESSING":
print('Waiting for video to be processed...')
time.sleep(10)
file_upload = client.files.get(name=file_upload.name)
if file_upload.state == "FAILED":
raise ValueError("Video processing failed")
print(f'Video processing complete: {file_upload.uri}')
Step 4: Define the Prompts
Define the system prompt and user prompt to instruct the model to generate captions.
SYSTEM_PROMPT = "When given a video and a query, call the relevant function only once with the appropriate timecodes and text for the video"
USER_PROMPT = """For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks.
Place each caption into an object sent to set_timecodes with the timecode of the caption in the video."""
Step 5: Generate Content Using the Model
Send the uploaded video and the prompts to the Gemini 2.0 model for processing.
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents=[
types.Content(
role="user",
parts=[
types.Part.from_uri(
file_uri=file_upload.uri,
mime_type=file_upload.mime_type
)
]
),
USER_PROMPT,
],
config=types.GenerateContentConfig(
system_instruction=SYSTEM_PROMPT,
temperature=0.0,
),
)
Step 6: Display the Results
The API’s response contains the captions with timecodes. Use the Markdown
library to display the results neatly.
from IPython.display import Markdown
## Render the captions as markdown
Markdown(response.text)
Applications of Video Analyzer
- Content Creation: Automate video summarization for blogs or reports.
- Accessibility: Generate captions for improved accessibility.
- Event Analysis: Highlight key moments in sports or presentations.
- Creative Outputs: Leverage creative interpretations like poetry for marketing.
Conclusion
Google AI Studio’s Video Analyzer is an excellent tool for video analysis, offering insights through captions, summaries, and object detection. By understanding its underlying principles and recreating it with Python, you can harness the power of AI to analyze and interpret video content effectively. Whether you’re building accessibility features, summarizing content, or exploring creative possibilities, the Video Analyzer provides a strong foundation to innovate.