How I built an agent with Pydantic AI and Google Gemini
- Rifx.Online
- Programming , Technology , Data Science
- 11 Jan, 2025
Gathering and synthesizing information quickly has become essential in today’s fast paced world. I built an AI agent to deliver strategic insights, leveraging a modern technology stack. This blog post will walk you through the process, highlighting the technologies and design choices, and demonstrating how you can build it on Google Cloud.
An AI agent is a system capable of perceiving its environment, reasoning, and taking actions to achieve specific goals. My agent analyzes web pages, understands community sentiment, and synthesizes this information into a coherent SWOT analysis.
Given a URL, our agent performs a SWOT analysis on the subject. SWOT (Strengths, Weaknesses, Opportunities, Threats) is a widely used strategic planning technique that evaluates the internal and external factors influencing an organization or project. It helps identify areas of advantage and areas needing improvement, ultimately driving informed decision-making.
The foundation of the AI agent stack
My goal was to build a powerful yet maintainable AI agent. This required using frameworks that ensured clear, concise code, avoiding “magic” that would obscure the underlying logic. I also prioritized type safety, utilizing Python’s type hinting to catch errors early. Finally, I wanted the agent to handle fluctuating workloads without extensive infrastructure management. These principles guided my technology choices:
- Pydantic AI: This framework enables structured and predictable outputs from our AI agent, using Pydantic models to define the expected data format, providing strong type safety.
- FastAPI: This modern Python web framework is the backbone of our application. Its speed, ease of use, and automatic data validation (thanks to its reliance on Pydantic) make it ideal for building high-performance APIs. We use it to create the endpoints for interacting with the agent and displaying results.
- HTMX: For a dynamic user interface, we use HTMX. It allows us to update specific parts of the web page without requiring full page reloads. HTMX achieves this through a declarative approach, minimizing the need for complex JavaScript.
- Tailwind CSS: This utility-first CSS framework enables rapid styling and ensures a consistent design language across the application. Its pre-defined classes allow for quick prototyping and easy customization, making the development of the user interface more efficient.
- Cloud Run: For deployment, we leverage Google Cloud Run, a fully managed serverless platform. This allows us to deploy our containerized application without worrying about infrastructure management. Cloud Run automatically scales based on demand, ensuring our agent remains responsive even under heavy load, and we only pay for the compute time consumed.
Understanding the agent code
The agent’s code is available in the Google Cloud generative-ai GitHub repository. Let’s explore its core components. The Ag
ent is defined using the Vertex AI model. The system prompt sets the overall direction:
swot_agent = Agent(
model=VertexAIModel(
model_name=MODEL,
service_account_file=SERVICE_ACCOUNT_FILE,
project_id=PROJECT_ID,
region=LOCATION,
),
deps_type=SwotAgentDeps,
result_type=SwotAnalysis,
system_prompt="""
You are a strategic business analyst tasked with performing SWOT analysis.
Analyze the given URL, identify internal strengths and weaknesses,
and evaluate external opportunities and threats based on market conditions
and competitive landscape. Use community insights to validate findings.
For each category:
- Strengths: Focus on internal advantages and unique selling points
- Weaknesses: Identify internal limitations and areas for improvement
- Opportunities: Analyze external factors that could be advantageous
- Threats: Evaluate external challenges and competitive pressures
Provide a detailed analysis that synthesizes these findings into actionable insights.
""",
retries=RETRIES,
)
The agent’s output is a SwotAnalysis
class, inheriting from a Pydantic BaseMo
del. This class defines fields as annotated attributes, enabling data validation and providing hints to the agent:
class SwotAnalysis(BaseModel):
"""Represents a SWOT analysis with strengths, weaknesses, opportunities, threats, and an overall analysis."""
strengths: List[str] = Field(
description="Internal strengths of the product/service"
)
weaknesses: List[str] = Field(
description="Internal weaknesses of the product/service"
)
opportunities: List[str] = Field(
description="External opportunities in the market"
)
threats: List[str] = Field(
description="External threats in the market"
)
analysis: str = Field(
description="A comprehensive analysis explaining the SWOT findings and their implications"
)
Equipping the agent with tools
To perform its analysis, our agent needs to gather information from various sources. This is achieved through a set of tools integrated within the Pydantic AI framework:
- Content Extraction: Using httpx and BeautifulSoup4, the agent can fetch and parse the HTML content of a given URL, extracting relevant text for analysis. This allows the agent to understand the core offerings and messaging presented on a website.
- Community Insights via the Reddit API: Leveraging praw, the Python Reddit API Wrapper, the agent can query specific subreddits on Reddit to gauge community sentiment and discussions related to a particular topic or product.
- Competitive Analysis powered by Gemini: The agent uses the Gemini API to perform a competitive analysis. By providing the product name and description, the agent can leverage Gemini’s powerful language understanding capabilities to identify key competitors, analyze their market positions, and assess competitive advantages and disadvantages.
Here’s a snippet of the Reddit community insights tool (simplified for readability):
@swot_agent.tool(prepare=report_tool_usage)
async def get_reddit_insights(
ctx: RunContext[SwotAgentDeps],
query: str,
subreddit_name: str = "googlecloud",
) -> str:
"""Gathers insights from a specific subreddit related to a query using PRAW."""
subreddit = ctx.deps.reddit.subreddit(subreddit_name)
search_results = subreddit.search(query)
insights = []
for post in search_results:
insights.append(
f"Title: {post.title}\n"
f"URL: {post.url}\n"
f"Content: {post.selftext}\n"
)
return "\n".join(insights)
Building the application
With the core AI agent logic defined, the next step was developing the UI and a server to handle the requests.
FastAPI serves as the backend, providing the API endpoints that the frontend interacts with. Its use of Python type hints, inherited from Pydantic, supports data validation and automatically generates interactive API documentation. The /analyze
endpoint, for example, accepts a URL via a POST request (defined in main.py
), triggers the agent’s analysis, and manages the asynchronous task, updating the status and result stores to track progress. The /status
and /result
endpoints are then used by the frontend to poll for updates and display the final SWOT analysis.
@app.post("/analyze", response_class=HTMLResponse)
async def analyze_url(request: Request, url: str = Form(...)) -> HTMLResponse:
"""Analyzes the given URL using the SWOT analysis agent."""
task = asyncio.create_task(run_agent_with_progress(session_id, url))
running_tasks.add(task)
task.add_done_callback(running_tasks.discard)
return templates.TemplateResponse(
"status.html",
{"request": request, "messages": [ANALYZING_MESSAGE], "result": False},
)
On the frontend, HTMX enhances the user experience by enabling dynamic updates without full page reloads. In templates/index.html
, the hx-p
ost, hx-
get, hx-trig
ger, and hx-tar
get attributes define how elements interact with the FastAPI backend.
For instance, submitting the form triggers a POST to /analyze
, and the response is swapped into the #status
div. Similarly, the #status
and #result
divs periodically poll their endpoints using hx-get
and hx-trigger=”every 1s”
, updating only the necessary sections of the page with the latest status messages or the completed SWOT analysis.
<div
id="status"
hx-get="/status"
hx-trigger="load, every 1s"
hx-swap="innerHTML transition:false"
style="display: none"
class="bg-white rounded-2xl shadow-lg p-6 mb-8"
>
Tailwind CSS enables a consistent design language throughout the application. Instead of writing custom CSS, Tailwind’s pre-defined classes are used directly in the HTML, streamlining the styling process and ensuring a cohesive look and feel. The HTML templates leverage Tailwind’s utility classes to create appealing and responsive layouts.
Deploying and Scaling with Google Cloud Run
Deploying the agent on Google Cloud Run is straightforward, thanks to its containerized architecture. Cloud Run, a fully managed serverless platform, automatically handles scaling, networking, and infrastructure, allowing you to focus on your code. You can deploy directly from the source code using the command provided in the README:
gcloud run deploy swot-agent --source . --region us-central1 --allow-unauthenticated
This command instructs Cloud Run to deploy a service named swot-agent
, build the image from the current directory, and deploy it to the us-central1
region. The — allow-unauthenticated
parameter makes the service publicly accessible for initial testing and demonstration purposes.
Cloud Run then provisions the necessary resources, runs the uvicorn ASGI server within the container, and scales the number of instances based on incoming traffic. Remember to configure secrets for sensitive information like API keys.
Trying it out yourself
This project demonstrates how to build an AI agent using a modern development stack. It showcases how an agent can assemble information from multiple sources such as Reddit to achieve a higher-level task like SWOT analysis.
I encourage you to explore the code on GitHub to find out more about the agent, the tools, and the FastAPI application. Deploy it to Cloud Run in your own project, and experiment with different inputs and configurations.
Let’s continue the discussion! Feel free to connect with me on LinkedIn, X, Bluesky, or Threads to share your thoughts and ideas.