Type something to search...
How I built an agent with Pydantic AI and Google Gemini

How I built an agent with Pydantic AI and Google Gemini

Gathering and synthesizing information quickly has become essential in today’s fast paced world. I built an AI agent to deliver strategic insights, leveraging a modern technology stack. This blog post will walk you through the process, highlighting the technologies and design choices, and demonstrating how you can build it on Google Cloud.

An AI agent is a system capable of perceiving its environment, reasoning, and taking actions to achieve specific goals. My agent analyzes web pages, understands community sentiment, and synthesizes this information into a coherent SWOT analysis.

Given a URL, our agent performs a SWOT analysis on the subject. SWOT (Strengths, Weaknesses, Opportunities, Threats) is a widely used strategic planning technique that evaluates the internal and external factors influencing an organization or project. It helps identify areas of advantage and areas needing improvement, ultimately driving informed decision-making.

The foundation of the AI agent stack

My goal was to build a powerful yet maintainable AI agent. This required using frameworks that ensured clear, concise code, avoiding “magic” that would obscure the underlying logic. I also prioritized type safety, utilizing Python’s type hinting to catch errors early. Finally, I wanted the agent to handle fluctuating workloads without extensive infrastructure management. These principles guided my technology choices:

  • Pydantic AI: This framework enables structured and predictable outputs from our AI agent, using Pydantic models to define the expected data format, providing strong type safety.
  • FastAPI: This modern Python web framework is the backbone of our application. Its speed, ease of use, and automatic data validation (thanks to its reliance on Pydantic) make it ideal for building high-performance APIs. We use it to create the endpoints for interacting with the agent and displaying results.
  • HTMX: For a dynamic user interface, we use HTMX. It allows us to update specific parts of the web page without requiring full page reloads. HTMX achieves this through a declarative approach, minimizing the need for complex JavaScript.
  • Tailwind CSS: This utility-first CSS framework enables rapid styling and ensures a consistent design language across the application. Its pre-defined classes allow for quick prototyping and easy customization, making the development of the user interface more efficient.
  • Cloud Run: For deployment, we leverage Google Cloud Run, a fully managed serverless platform. This allows us to deploy our containerized application without worrying about infrastructure management. Cloud Run automatically scales based on demand, ensuring our agent remains responsive even under heavy load, and we only pay for the compute time consumed.

Understanding the agent code

The agent’s code is available in the Google Cloud generative-ai GitHub repository. Let’s explore its core components. The Agent is defined using the Vertex AI model. The system prompt sets the overall direction:

swot_agent = Agent(
    model=VertexAIModel(
        model_name=MODEL,
        service_account_file=SERVICE_ACCOUNT_FILE,
        project_id=PROJECT_ID,
        region=LOCATION,
    ),
    deps_type=SwotAgentDeps,
    result_type=SwotAnalysis,
    system_prompt="""
        You are a strategic business analyst tasked with performing SWOT analysis.
        Analyze the given URL, identify internal strengths and weaknesses,
        and evaluate external opportunities and threats based on market conditions
        and competitive landscape. Use community insights to validate findings.

        For each category:
        - Strengths: Focus on internal advantages and unique selling points
        - Weaknesses: Identify internal limitations and areas for improvement
        - Opportunities: Analyze external factors that could be advantageous
        - Threats: Evaluate external challenges and competitive pressures

        Provide a detailed analysis that synthesizes these findings into actionable insights.
    """,
    retries=RETRIES,
)

The agent’s output is a SwotAnalysis class, inheriting from a Pydantic BaseModel. This class defines fields as annotated attributes, enabling data validation and providing hints to the agent:

class SwotAnalysis(BaseModel):
"""Represents a SWOT analysis with strengths, weaknesses, opportunities, threats, and an overall analysis."""

strengths: List[str] = Field(
   description="Internal strengths of the product/service"
)
weaknesses: List[str] = Field(
    description="Internal weaknesses of the product/service"
)
opportunities: List[str] = Field(
    description="External opportunities in the market"
)
threats: List[str] = Field(
    description="External threats in the market"
)
analysis: str = Field(
    description="A comprehensive analysis explaining the SWOT findings and their implications"
)

Equipping the agent with tools

To perform its analysis, our agent needs to gather information from various sources. This is achieved through a set of tools integrated within the Pydantic AI framework:

  • Content Extraction: Using httpx and BeautifulSoup4, the agent can fetch and parse the HTML content of a given URL, extracting relevant text for analysis. This allows the agent to understand the core offerings and messaging presented on a website.
  • Community Insights via the Reddit API: Leveraging praw, the Python Reddit API Wrapper, the agent can query specific subreddits on Reddit to gauge community sentiment and discussions related to a particular topic or product.
  • Competitive Analysis powered by Gemini: The agent uses the Gemini API to perform a competitive analysis. By providing the product name and description, the agent can leverage Gemini’s powerful language understanding capabilities to identify key competitors, analyze their market positions, and assess competitive advantages and disadvantages.

Here’s a snippet of the Reddit community insights tool (simplified for readability):

@swot_agent.tool(prepare=report_tool_usage)
async def get_reddit_insights(
    ctx: RunContext[SwotAgentDeps],
    query: str,
    subreddit_name: str = "googlecloud",
) -> str:
    """Gathers insights from a specific subreddit related to a query using PRAW."""
    subreddit = ctx.deps.reddit.subreddit(subreddit_name)
    search_results = subreddit.search(query)

    insights = []
    for post in search_results:
       insights.append(
           f"Title: {post.title}\n"
           f"URL: {post.url}\n"
           f"Content: {post.selftext}\n"
       )
    return "\n".join(insights)

Building the application

With the core AI agent logic defined, the next step was developing the UI and a server to handle the requests.

FastAPI serves as the backend, providing the API endpoints that the frontend interacts with. Its use of Python type hints, inherited from Pydantic, supports data validation and automatically generates interactive API documentation. The /analyze endpoint, for example, accepts a URL via a POST request (defined in main.py), triggers the agent’s analysis, and manages the asynchronous task, updating the status and result stores to track progress. The /status and /result endpoints are then used by the frontend to poll for updates and display the final SWOT analysis.

@app.post("/analyze", response_class=HTMLResponse)
async def analyze_url(request: Request, url: str = Form(...)) -> HTMLResponse:
    """Analyzes the given URL using the SWOT analysis agent."""

    task = asyncio.create_task(run_agent_with_progress(session_id, url))
    running_tasks.add(task)
    task.add_done_callback(running_tasks.discard)

    return templates.TemplateResponse(
        "status.html",
        {"request": request, "messages": [ANALYZING_MESSAGE], "result": False},
    )

On the frontend, HTMX enhances the user experience by enabling dynamic updates without full page reloads. In templates/index.html, the hx-post, hx-get, hx-trigger, and hx-target attributes define how elements interact with the FastAPI backend.

For instance, submitting the form triggers a POST to /analyze, and the response is swapped into the #status div. Similarly, the #status and #result divs periodically poll their endpoints using hx-get and hx-trigger=”every 1s”, updating only the necessary sections of the page with the latest status messages or the completed SWOT analysis.

<div
    id="status"
    hx-get="/status"
    hx-trigger="load, every 1s"
    hx-swap="innerHTML transition:false"
    style="display: none"
    class="bg-white rounded-2xl shadow-lg p-6 mb-8"
>

Tailwind CSS enables a consistent design language throughout the application. Instead of writing custom CSS, Tailwind’s pre-defined classes are used directly in the HTML, streamlining the styling process and ensuring a cohesive look and feel. The HTML templates leverage Tailwind’s utility classes to create appealing and responsive layouts.

Deploying and Scaling with Google Cloud Run

Deploying the agent on Google Cloud Run is straightforward, thanks to its containerized architecture. Cloud Run, a fully managed serverless platform, automatically handles scaling, networking, and infrastructure, allowing you to focus on your code. You can deploy directly from the source code using the command provided in the README:

gcloud run deploy swot-agent --source . --region us-central1 --allow-unauthenticated

This command instructs Cloud Run to deploy a service named swot-agent, build the image from the current directory, and deploy it to the us-central1 region. The — allow-unauthenticated parameter makes the service publicly accessible for initial testing and demonstration purposes.

Cloud Run then provisions the necessary resources, runs the uvicorn ASGI server within the container, and scales the number of instances based on incoming traffic. Remember to configure secrets for sensitive information like API keys.

Trying it out yourself

This project demonstrates how to build an AI agent using a modern development stack. It showcases how an agent can assemble information from multiple sources such as Reddit to achieve a higher-level task like SWOT analysis.

I encourage you to explore the code on GitHub to find out more about the agent, the tools, and the FastAPI application. Deploy it to Cloud Run in your own project, and experiment with different inputs and configurations.

Let’s continue the discussion! Feel free to connect with me on LinkedIn, X, Bluesky, or Threads to share your thoughts and ideas.

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

10 Creative Ways to Use ChatGPT Search The Web Feature

For example, prompts and outputs Did you know you can use the “search the web” feature of ChatGPT for many tasks other than your basic web search? For those who don't know, ChatGPT’s new

Read More
📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

In an industry as dynamic as AI and tech, staying ahead means constantly upgrading your skills. Whether you’re aiming to dive deep into AI model performance, master data analysis, or transform trad

Read More
10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

In today’s fast-paced digital world, marketers are always looking for smarter ways to streamline their efforts. Imagine having a personal assistant who can create audience profiles, suggest mar

Read More
10+ Top ChatGPT Prompts for UI/UX Designers

10+ Top ChatGPT Prompts for UI/UX Designers

AI technologies, such as machine learning, natural language processing, and data analytics, are redefining traditional design methodologies. From automating repetitive tasks to enabling personal

Read More
100 AI Tools to Finish Months of Work in Minutes

100 AI Tools to Finish Months of Work in Minutes

The rapid advancements in artificial intelligence (AI) have transformed how businesses operate, allowing people to complete tasks that once took weeks or months in mere minutes. From content creat

Read More
17 Mindblowing GitHub Repositories You Never Knew Existed

17 Mindblowing GitHub Repositories You Never Knew Existed

Github Hidden Gems!! Repositories To Bookmark Right Away Learning to code is relatively easy, but mastering the art of writing better code is much tougher. GitHub serves as a treasur

Read More