Simplifying AI Agent Construction: 5 Steps Using Pydantic AI Framework

Rifx.Online
Machine Learning , Natural Language Processing , AI Applications
05 Mar, 2025

PydanticAI: 一个“Hello World”介绍

Pydantic 是 AI 代理框架领域相对较新的成员。然而，从 Python 程序员的角度来看，能够同时利用 Pydantic 强大的数据框架和更新的 AI 框架，创造了一种无与伦比的组合。

Pydantic 由 Samuel Colvin 开发，是一个功能强大的 Python 数据验证和设置管理库。它于 2018 年推出，旨在解决在 Python 应用程序中管理和验证数据的常见挑战，尤其是在 API 开发方面。它的核心目标包括数据验证、数据解析和类型强制。

最新的补充 PydanticAI 将 Pydantic 的功能扩展到了 AI 领域。在其当前版本（v0.0.19，日期为 2025–01–15）中，它的描述如下：

“PydanticAI 是一个 Python 代理框架，旨在使使用生成式 AI 构建生产级应用程序变得不那么痛苦。”

PydanticAI 入门

本文作为 PydanticAI 框架的“Hello World”介绍，提供了一个简单的用例来帮助您入门。

用例：

由于我已经拥有使用 Nominatim API 进行基本地理定位的 Python 函数，因此本例的目标是识别文本消息中的地名并对其进行地理定位。具体来说，我想提取每个已识别位置的地址、纬度和经度。

入门：

from pydantic_ai import Agent, RunContext
from pydantic_ai.models.openai import OpenAIModel
from pydantic import BaseModel

from geopy.geocoders import Nominatim

创建一个 Pydantic 类 Location_Result，它定义了结果的模式。此模式也可用于呈现 JSON 表示形式。

class Location_Result(BaseModel):
    location: str
    latitude: float
    longitude: float
    displayName:str
    address:str

识别 LLM 和模型

创建一个model对象。在这种情况下，我指定它应该是 OpenAI 的 GPT-4o，我需要做的就是提供模型名称和 API 密钥。

model = OpenAIModel('gpt-4o', api_key='Your OpenAI API key goes here')

使用系统提示定义一个Agent。 model对象和result_type设置为前面定义的。通过使用前面定义的 pydantic 对象 Location_Result，我们确保了结构化的响应。

定义 Agent

要创建一个 AI 代理，我们使用系统提示定义它，同时将模型对象和结果类型设置为前面指定的。

通过利用前面定义的 Pydantic 对象 Location_Result，我们确保 AI 代理提供符合类定义的结构化响应。这种方法保证了输出的一致性和可靠性，使其与预期的数据结构保持一致。

## Define the geo agent
geo_agent = Agent(
    model,
    system_prompt=(
        'You are expert in determining the geo coordinats of a location'
    ),
    result_type=Location_Result,
)

为 AI 代理设置工具

这个 geo_agent 将需要一些工具。我们将使用标准的 geopylibrary，特别是 Nominatim API 来获取给定位置的纬度、经度和显示名称。

以前，我使用以下代码设置了一个 Nominatim 客户端。

## Instantiate a new Nominatim client
app = Nominatim(user_agent="myGeocoder")

为 AI 代理或助手配置工具通常是一项复杂的任务。但是，在本例中，我们通过获取三个先前定义的函数 — get_lat、get_lon 和 get_display_name — 并将它们注册到我们之前定义的 geo_agent 来简化该过程。

要将这些工具注册到特定的代理，我们使用 Python 装饰器 @geo_agent.tool_plain。虽然还有其他可用的装饰器，但这个装饰器足以用于我们的简单示例，以便将工具注册到代理。

这种方法的优雅之处在于它的简单性——它显着简化了将工具注册到 AI 代理的过程。无需修改核心 Python 代码即可使这些函数可供代理使用。如果删除装饰器，这些函数将继续作为常规的可调用 Python 函数工作，保持其在 AI 框架之外的可用性。

@geo_agent.tool_plain
async def get_lat(raw_location):
    """Determine latitude from location"""

    location = app.geocode(raw_location).raw

    lat = location['lat']

    return lat

@geo_agent.tool_plain
async def get_long(raw_location):
    """Determine longitude of location"""

    location = app.geocode(raw_location).raw

    lon = location['lon']

    return lon

@geo_agent.tool_plain
async def get_display_name(raw_location):
    """Determine display name from location """

    location = app.geocode(raw_location).raw

    display_name = location['display_name']

    return display_name

模型设置

我们可以选择使用类似如下所示的字典作为 model_settings 来定义 LLM 每次运行的设置。在这里，我们可以设置模型参数，例如 temperature 和 top_p，如果需要的话。

model_settings = {'temperature': 0.0,
                  'max_tokens': 1000,
                  'top_p': 1.0,
                  'frequency_penalty': 0.0,
                  'presence_penalty': 0.0}

在下面的例子中，我们构建了提示。在这种情况下，我想传递一条文本消息，并让模型识别消息中的任何位置，然后使用该位置确定地址、纬度和经度。

运行 Agent

下面显示的语句运行了 geo_agent。

msg= " Hi how are you doing? Can we meet at Starbucks in Cobourg at 3pm today?"
prompt = 'Determine the location and address of any locations mentioned in the following text message:'

result = geo_agent.run_sync(prompt+msg, model_settings=model_settings)

最后，我希望 agent 返回一个结构化的响应。

回想一下我们在本程序开始时定义的 Location_Result 对象。该对象指定了我们所需的结构化响应格式。定义此类结构化响应是最佳实践，因为它允许 agent 在未来与其它 agent 和系统无缝协作。

当 agent 运行时，返回的结果对象包含大量信息。但是，目前，我们主要关注最终的结构化响应，可以通过 result.data 访问它。

以下部分展示了代码及其对应的输出。

geo_result = result.data

print('\n',geo_result.location)
print('\n',geo_result.latitude)
print('\n',geo_result.longitude)
print('\n',geo_result.address)

 Starbucks, Cobourg

43.9711446

-78.1985192

Starbucks, Elgin Street West, Cobourg, Northumberland County, Central Ontario, Ontario, K9A 5H7, Canada

result 对象还包含有关模型使用情况的信息，可以通过这种方式访问。

## Usage returns the total number of tokens used and other values
usage = result.usage()
print('\n Usage:', usage.total_tokens)

也许更重要的是，result 对象包含运行的完整消息历史记录。此消息历史记录可以使用 result.all_messages() 呈现为列表，或者使用 result.all_messages_json() 呈现为 JSON 字节格式。

## All Messages returns the full conversation as a list of messages
all_messages = result.all_messages()
for message in all_messages:
    print('\n',message)

## All Messages JSON returns the full conversation in JSON bytes format
## You can decode it back to string if needed

json_bytes = result.all_messages_json()
json_string = json_bytes.decode('utf-8')  # Decode bytes back to string
print('\n JSON String:',json_string)

更进一步…

处理文本消息中的多个位置

现在，让我们更进一步。如果文本消息包含多个位置怎么办？修改代码以确定单个消息中多个位置的坐标有多大的挑战性？

事实上，这非常简单。关键的更改涉及更新结果类型，如下所示。引入了一个新类 Locations，用于封装 Location_Result 对象的列表，我们之前在本文中定义了这些对象。

创建一个新的 Result_Type

通过以这种方式构建响应，agent 可以处理文本消息中的多个位置，同时保持清晰一致的格式。

class Locations(BaseModel):
    places: list[Location_Result]

之后，我所要做的就是更改 agent 上的 result_type 参数以使用新的模式 Locations.

geo_agent = Agent(
    model,
    system_prompt=(
        'You are expert in determining the geo coordinats of a location'
    ),
    result_type=Locations,
)

给定相同的提示，但文本消息包含两个位置：Starbucks 和 Victoria Hall，都在 Cobourg。

msg= """Hi how are you doing? Can we meet at Starbucks in Cobourg at 3pm today?
 Afterwards we can go Victoria Hall, also in Cobourg"""

打印响应的代码必须更改，因为结果现在是一个列表。代码和响应如下所示。

## Iterate through Locations and print each element of Location_Result
for place in geo_result.places:
    print('\nLocation:', place.location)
    print('Latitude:', place.latitude)
    print('Longitude:', place.longitude)
    print('Display Name:', place.displayName)
    print('Address:', place.address)

Location: Starbucks, Cobourg

Latitude: 43.9711446
Longitude: -78.1985192
Display Name: Starbucks, Elgin Street West, Cobourg, Northumberland County, Central Ontario, Ontario, K9A 5H7, Canada
Address: Elgin Street West, Cobourg, Northumberland County, Central Ontario, Ontario, K9A 5H7, Canada

Location: Victoria Hall, Cobourg
Latitude: 43.9593457
Longitude: -78.16777396077927
Display Name: Victoria Hall, 55, King Street West, Downtown Cobourg, Cobourg, Northumberland County, Central Ontario, Ontario, K9A 2M2, Canada
Address: 55, King Street West, Downtown Cobourg, Cobourg, Northumberland County, Central Ontario, Ontario, K9A 2M2, Canada

总结

尽管 PydanticAI 是 AI agent 框架领域的新来者，并且仍处于 beta 阶段，但它是一个值得密切关注的框架。它专注于 Python 既有优点也有缺点。一方面，基于 Python 的框架通常更容易使用——前提是您已经熟悉 Python。另一方面，这种排他性可能会限制使用其它编程语言的开发人员的访问。

PydanticAI 的主要优势之一是它与 LLM 无关。在本文中介绍的示例中，我使用了 OpenAI 模型框架，但我也可以很容易地选择另一个大型语言模型 (LLM)，包括本地开源模型。适应不同的模型只需要一个小的更改——更新导入语句和模型定义——这使其具有高度的灵活性，并适应各种用例。