创建您自己的 OpenAI 操作员代理：使用 Microsoft Autogen 框架最新功能的逐步指南

Rifx.Online
Large Language Models , AI Applications , AI Research
05 Mar, 2025

用几行代码创建你自己的类似 OpenAI 的 Operator 代理！

Microsoft AutoGen 是一个用于构建 LLM 驱动的 AI 应用程序的开源框架。之前，我写了一篇关于 AutoGen 0.2 版本和 AutoGen Studio 的博客。在这篇博客中，我将介绍 AutoGen 的最新版本。

AutoGen 0.4 的核心是代理。它提供了一些预构建的代理，无需深入研究细节即可利用。我们开箱即用的最好的代理系统之一是 ‘Magentic-One’。

[图片 1] AutoGen 框架构建模块。

除此之外，Autogen 0.4 还更新了 No-Code Agent 平台“Autogen Studio”，外观焕然一新，并使用 React-flow 提供无缝的代理构建体验。

[图片 2] AutoGen Studio 截图。

我认为这个介绍足以让我们开始逐步指南。

从 Magentic-One 代理系统（参考图片 3），我们将使用 2 个组件：

Web Surfer 代理
Orchestrator ( MagenticOneGroupChat)

[图片 3] Magentic-One 架构。

步骤 0： 使用 conda 准备虚拟环境。

conda activate autogen

提示：建议使用虚拟环境，因为我们将访问 Web 获取数据。

步骤 1： 安装多模态 Web Surfer 代理

pip install "autogen-ext[web-surfer]"

提示：我在 0.4.5 稳定版上遇到了缺少软件包的问题。如果您也遇到此问题，只需运行 pip install -U “autogen-agentchat。更多安装信息在此处。

步骤 2： 创建 Azure OpenAI 客户端。我将使用 DefaultAzureCredentia() 作为一种安全的方式连接到 Azure OpenAI，而不是基于密钥的身份验证。

from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

### Create the token provider
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default" # while running locally, it will use user's credentail.
)

client = AzureOpenAIChatCompletionClient(
    azure_deployment="GPT4ov1", # deployment name from Azure OpenAI Deployments tab
    model="gpt-4o",
    api_version="2024-08-01-preview",
    azure_endpoint="https://testmediumazureopenai.openai.azure.com/", # your Azure OpenAI endpoint
    azure_ad_token_provider=token_provider,
)

您也可以使用 OpenAI 创建客户端：

 model_client=OpenAIChatCompletionClient(model="gpt-4o-2024-08-06")

### Note you need to add `api-key` in the env variable

提示：如果您使用基于 Azure Ad 的身份验证，请确保您已安装 Azure CLI — https://learn.microsoft.com/en-gb/azure/developer/azure-developer-cli

步骤 3： 定义 Web Surfer 代理

    # Define an agent
    web_surfer_agent = MultimodalWebSurfer(
        name="MultimodalWebSurfer",
        model_client=client,
        headless = False, # to open cromium browser in GUI mode
        animate_actions = True # to animate click actions
    )

步骤 4： 使用 Magnetic-One orchestrator 创建代理团队

 agent_team = MagenticOneGroupChat([web_surfer_agent], max_turns=13, model_client=client)

提示：Web Surfer 代理基于 Playwright 框架进行自动化测试。您需要使用 **playwright install 安装它

步骤 5： 将所有内容放在一起

async def main() -> None:
    token_provider = get_bearer_token_provider(
        DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default" # while running locally, it will use user's credentail.
    )

client = AzureOpenAIChatCompletionClient(
        azure_deployment="GPT4ov1", # deployment name from Azure OpenAI Deployments tab
        model="gpt-4o",
        api_version="2024-08-01-preview",
        azure_endpoint="https://testmediumazureopenai.openai.azure.com/", # your Azure OpenAI endpoint
        azure_ad_token_provider=token_provider,
    )
    # Define an agent
    web_surfer_agent = MultimodalWebSurfer(
        name="MultimodalWebSurfer",
        model_client=client,
        headless = False, # to open cromium browser in GUI mode
        animate_actions = True # to animate click actions
    )

## Define a team

agent_team = MagenticOneGroupChat([web_surfer_agent], max_turns=13, model_client=client)

task = input("Enter the task for the agent team: ")

## Run the team and stream messages to the console
    stream = agent_team.run_stream(task=task)
    await Console(stream)

## Close the browser controlled by the agent
    await web_surfer_agent.close()

asyncio.run(main())

因此，对于任务：“转到 opentable，并在亚特兰大预订餐厅。使用我的电话号码 123–456–7890 进行预订”

最后的想法

很高兴看到在 AutoGen 中创建基于视觉的 Web Surfer 代理是多么容易。但是，请记住，AutoGen 仅用于研究和概念验证目的，不应在生产中使用。 Microsoft 建议使用 Semantic Kernel 进行企业支持。虽然 Semantic Kernel 仍然落后于 AutoGen，但在 Semantic Kernel 中创建此类代理也很容易。查看我之前的博客了解更多详情。