Type something to search...
创建实时音频对话 AI 应用:使用 Next.js 和 Agora 的逐步指南

创建实时音频对话 AI 应用:使用 Next.js 和 Agora 的逐步指南

对话 AI 备受关注。它允许你与 AI 代理进行实时对话,并且无需浪费时间输入你的想法并尝试将其格式化为巧妙的提示,就可以真正完成一些事情。这是人们与 AI 交互方式的重大转变。

但是,考虑到开发人员和企业在构建自己的基于文本的代理(通过自定义 LLM 工作流程运行)方面所做的投资,他们不愿采用这种新范例。特别是如果这意味着不得不放弃所有这些投资,或者更糟的是,仅仅通过将它们连接为工具/函数调用来削弱它。

这就是我们构建 Agora 对话 AI 引擎的原因。它允许你将现有的 LLM 工作流程连接到 Agora 频道,并与 AI 代理进行实时对话。

在本指南中,我们将构建一个实时音频对话应用程序,该应用程序将用户与由 Agora 对话 AI 引擎提供支持的 AI 代理连接起来。该应用程序将使用 NextJS、React 和 TypeScript 构建。我们将采用增量方法,从核心的实时通信组件开始,然后加入 Agora 的 Convo AI 引擎。

在本指南结束时,你将拥有一个实时音频对话应用程序,该应用程序将用户与由 Agora 的对话 AI 引擎提供支持的 AI 代理连接起来。

先决条件

在开始之前,对于本指南,你需要具备以下条件:

项目设置

让我们首先创建一个新的 NextJS 项目,并支持 TypeScript。

pnpm create next-app@latest ai-conversation-app
cd ai-conversation-appt

出现提示时,选择以下选项:

  • TypeScript:是
  • ESLint:是
  • Tailwind CSS:是
  • 使用 src/ 目录:否
  • App Router:是
  • 使用 Turbopack:否
  • 自定义导入别名:是(使用默认的 @/*

接下来,安装所需的 Agora 依赖项:

pnpm add agora-rtc-react agora-token

对于 UI 组件,在本指南中,我们将使用 shadcn/ui,但你可以使用你选择的任何 UI 库或创建自定义组件:

pnpm dlx shadcn@latest init

对于本指南,我们还将使用 Lucide 图标,因此也安装它:

pnpm add lucide-react

在我们完成本指南的过程中,你将不得不在特定目录中创建新文件。因此,在我们开始之前,让我们创建这些新目录。

在你的项目根目录中,创建 app/api/components/types/ 目录,并添加 .env.local 文件:

mkdir app/api components types
touch .env.local

你的项目目录现在应该具有如下结构:

├── app/
│   ├── api/
│   ├── globals.css
│   ├── layout.tsx
│   └── page.tsx
├── components/
├── types/
├── .env.local
└── (... 现有文件和目录)

登陆页面组件

让我们从设置初始化 Agora 客户端并设置 AgoraProvider 的登陆页面开始。

components/LandingPage.tsx 处创建 LandingPage 组件文件:

touch components/LandingPage.tsx

目前,我们将保持此组件简单,并在我们完成本指南的过程中为其添加更多功能。我在整个代码中都包含了注释,以帮助你理解正在发生的事情。从高层次上讲,我们正在导入 Agora React SDK 并创建 AgoraRTC 客户端,然后将其传递给 AgoraProvider,以便所有子组件都使用相同的 client 实例。

将以下代码添加到 LandingPage.tsx 文件中:

'use client';

import { useState, useMemo } from 'react';
import dynamic from 'next/dynamic';

// Agora 需要访问浏览器的 WebRTC API,
// - 如果通过 SSR 加载,则会抛出错误
// 创建一个禁用了 SSR 的组件,
// - 并使用它在客户端加载 AgoraRTC 组件
const AgoraProvider = dynamic(
  async () => {
    // 动态导入 Agora 的组件
    const { AgoraRTCProvider, default: AgoraRTC } = await import(
      'agora-rtc-react'
    );

    return {
      default: ({ children }: { children: React.ReactNode }) => {
        // 使用 useMemo 创建一次 Agora RTC 客户端
        const client = useMemo(
          () => AgoraRTC.createClient({ mode: 'rtc', codec: 'vp8' }),
          []
        );

        // 提供程序使客户端可用于所有子组件
        return <AgoraRTCProvider client={client}>{children}</AgoraRTCProvider>;
      },
    };
  },
  { ssr: false } // 重要:禁用此组件的 SSR
);

export default function LandingPage() {
  // 基本设置,我们将在完成本指南的过程中添加更多功能。
  return (
    <div className="min-h-screen bg-gray-900 text-white p-4">
      <h1 className="text-4xl font-bold mb-6 text-center">
        Agora AI Conversation
      </h1>

      <div className="max-w-4xl mx-auto">
        <p className="text-lg mb-6 text-center">
          你上次进行智能对话是什么时候?
        </p>

        {/* 我们开始对话按钮的占位符 */}
        <div className="flex justify-center mb-8">
          <button className="px-6 py-3 bg-blue-600 text-white rounded-lg">
            开始对话
          </button>
        </div>

        <AgoraProvider>
          <div>占位符:我们将在其中添加对话组件</div>
        </AgoraProvider>
      </div>
    </div>
  );
}

现在,更新你的 app/page.tsx 文件以使用此登陆页面:

import LandingPage from '@/components/LandingPage';

export default function Home() {
  return <LandingPage />;
}

基本 Agora React JS 实现

在设置好着陆页之后,我们可以专注于实现 Agora 的 React JS SDK 来处理核心 RTC 功能,例如加入频道、发布音频、接收音频以及处理 Agora SDK 事件。

components/ConversationComponent.tsx 中创建一个文件:

touch components/ConversationComponent.tsx

添加以下代码:

'use client';

import { useState, useEffect, useCallback } from 'react';
import {
  useRTCClient,
  useLocalMicrophoneTrack,
  useRemoteUsers,
  useClientEvent,
  useIsConnected,
  useJoin,
  usePublish,
  RemoteUser,
  UID,
} from 'agora-rtc-react';

export default function ConversationComponent() {
  // Access the client from the provider context
  const client = useRTCClient();

// Track connection status
  const isConnected = useIsConnected();

// Manage microphone state
  const [isEnabled, setIsEnabled] = useState(true);
  const { localMicrophoneTrack } = useLocalMicrophoneTrack(isEnabled);

// Track remote users (like our AI agent)
  const remoteUsers = useRemoteUsers();

// Join the channel when component mounts
  const { isConnected: joinSuccess } = useJoin(
    {
      appid: process.env.NEXT_PUBLIC_AGORA_APP_ID!, // Load APP_ID from env.local
      channel: 'test-channel',
      token: 'replace-with-token',
      uid: 0, // Join with UID 0 and Agora will assign a unique ID when the user joins
    },
    true // Join automatically when the component mounts
  );

// Publish our microphone track to the channel
  usePublish([localMicrophoneTrack]);

// Set up event handlers for client events
  useClientEvent(client, 'user-joined', (user) => {
    console.log('Remote user joined:', user.uid);
  });

useClientEvent(client, 'user-left', (user) => {
    console.log('Remote user left:', user.uid);
  });

// Toggle microphone on/off
  const toggleMicrophone = async () => {
    if (localMicrophoneTrack) {
      await localMicrophoneTrack.setEnabled(!isEnabled);
      setIsEnabled(!isEnabled);
    }
  };

// Clean up when component unmounts
  useEffect(() => {
    return () => {
      client?.leave(); // Leave the channel when the component unmounts
    };
  }, [client]);

return (
    <div className="p-4 bg-gray-800 rounded-lg">
      <div className="mb-4">
        <p className="text-white">
          {/* Display the connection status */}
          Connection Status: {isConnected ? 'Connected' : 'Disconnected'}
        </p>
      </div>

{/* Display remote users */}
      <div className="mb-4">
        {remoteUsers.length > 0 ? (
          remoteUsers.map((user) => (
            <div
              key={user.uid}
              className="p-2 bg-gray-700 rounded mb-2 text-white"
            >
              <RemoteUser user={user} />
            </div>
          ))
        ) : (
          <p className="text-gray-400">No remote users connected</p>
        )}
      </div>

{/* Microphone control */}
      <button
        onClick={toggleMicrophone}
        className={`px-4 py-2 rounded ${
          isEnabled ? 'bg-green-500' : 'bg-red-500'
        } text-white`}
      >
        Microphone: {isEnabled ? 'On' : 'Off'}
      </button>
    </div>
  );
}

此组件是我们实时音频通信的基础,因此让我们回顾一下我们正在使用的 Agora React hooks:

  • useRTCClient:从我们在着陆页中设置的提供程序获取对 Agora RTC 客户端的访问权限
  • useLocalMicrophoneTrack:创建和管理用户的麦克风输入
  • useRemoteUsers:跟踪频道中的其他用户(我们的 AI 代理将出现在这里)
  • useJoin:使用指定的参数处理加入频道
  • usePublish:将我们的音频轨道发布到频道,以便其他人可以听到我们
  • useClientEvent:设置重要事件(如用户加入或离开)的事件处理程序

注意:我们使用非空断言运算符从环境变量加载 APP_ID,因此请确保在您的 .env.local 文件中设置它。

我们需要将此组件添加到我们的 LandingPage.tsx 文件中。首先导入该组件,然后将其添加到 AgoraProvider 组件中。

// Previous imports remain the same as before...
// Dynamically import the ConversationComponent with ssr disabled
const ConversationComponent = dynamic(() => import('./ConversationComponent'), {
  ssr: false,
});
// Previous code remains the same as before...
<AgoraProvider>
  <ConversationComponent />
</AgoraProvider>;

接下来,我们将实现令牌身份验证,为我们的应用程序添加一层安全性。

令牌生成和管理

Agora 团队强烈建议对所有应用程序使用基于令牌的身份验证,尤其是在生产环境中。 在此步骤中,我们将创建一个路由来生成这些令牌,并更新我们的 LandingPageConversationComponent 以使用它们。

Token 生成路由

让我们分解一下 token 生成路由需要做什么:

  1. 使用我们的 App ID 和 Certificate 生成一个安全的 Agora token
  2. 为每个对话创建一个唯一的 channel name
  3. 将 token、channel name 和我们用来生成它的 UID 返回给客户端
  4. 支持 token 刷新,使用现有的 channel name 和 UID

app/api/generate-agora-token/route.ts 创建一个新文件:

mkdir app/api/generate-agora-token
touch app/api/generate-agora-token/route.ts

添加以下代码:

import { NextRequest, NextResponse } from 'next/server';
import { RtcTokenBuilder, RtcRole } from 'agora-token';

// Access environment variables
const APP_ID = process.env.NEXT_PUBLIC_AGORA_APP_ID;
const APP_CERTIFICATE = process.env.NEXT_PUBLIC_AGORA_APP_CERTIFICATE;
const EXPIRATION_TIME_IN_SECONDS = 3600; // Token valid for 1 hour

// Helper function to generate unique channel names
function generateChannelName(): string {
  // Combine timestamp and random string for uniqueness
  const timestamp = Date.now();
  const random = Math.random().toString(36).substring(2, 8);
  return `ai-conversation-${timestamp}-${random}`;
}

export async function GET(request: NextRequest) {
  console.log('Generating Agora token...');

  // Verify required environment variables are set
  if (!APP_ID || !APP_CERTIFICATE) {
    console.error('Agora credentials are not set');
    return NextResponse.json(
      { error: 'Agora credentials are not set' },
      { status: 500 }
    );
  }

  // Get query parameters (if any)
  const { searchParams } = new URL(request.url);
  const uidStr = searchParams.get('uid') || '0';
  const uid = parseInt(uidStr);

  // Use provided channel name or generate new one
  const channelName = searchParams.get('channel') || generateChannelName();

  // Calculate token expiration time
  const expirationTime =
    Math.floor(Date.now() / 1000) + EXPIRATION_TIME_IN_SECONDS;

  try {
    // Generate the token using Agora's SDK
    console.log('Building token with UID:', uid, 'Channel:', channelName);
    const token = RtcTokenBuilder.buildTokenWithUid(
      APP_ID,
      APP_CERTIFICATE,
      channelName,
      uid,
      RtcRole.PUBLISHER, // User can publish audio/video
      expirationTime,
      expirationTime
    );

    console.log('Token generated successfully');
    // Return the token and session information to the client
    return NextResponse.json({
      token,
      uid: uid.toString(),
      channel: channelName,
    });
  } catch (error) {
    console.error('Error generating Agora token:', error);
    return NextResponse.json(
      { error: 'Failed to generate Agora token', details: error },
      { status: 500 }
    );
  }
}

此路由处理我们应用程序的 token 生成,所以让我们回顾一下重要的功能:

  • 使用时间戳和随机字符串生成唯一的 channel name 以避免冲突
  • 使用 App ID 和 Certificate 生成一个安全的 token
  • 接受 url 参数,以便使用现有的 channel name 和用户 ID 刷新 token

注意:此路由从环境变量中加载 APP_IDAPP_CERTIFICATE,因此请确保在你的 .env.local 文件中设置它们。

更新登录页面以请求 Token

设置好 token 路由后,让我们更新登录页面,以处理所有 token 提取逻辑。首先,我们需要为 token 数据创建一个新的类型定义,以便在我们的组件中使用它。

types/conversation.ts 中创建一个文件:

touch types/conversation.ts

添加以下代码:

// Types for Agora token data
export interface AgoraLocalUserInfo {
  token: string;
  uid: string;
  channel: string;
  agentId?: string;
}

打开 components/LandingPage.tsx 文件,更新 react 导入,添加 AgoraLocalUserInfo 类型的新的导入语句,并更新整个 LandingPage() 函数。

我们将使用 Suspense,因为 Agora React SDK 是动态加载的,并且会话组件需要一些时间来加载,因此最好显示加载状态,直到它准备就绪。

'use client';

import { useState, useMemo, Suspense } from 'react'; // added Suspense
// Previous imports remain the same as before...
import type { AgoraLocalUserInfo } from '../types/conversation';

export default function LandingPage() {
  // Manage conversation state
  const [showConversation, setShowConversation] = useState(false);
  // Manage loading state, while the agent token is generated
  const [isLoading, setIsLoading] = useState(false);
  // Manage error state
  const [error, setError] = useState<string | null>(null);
  // Store the token data for the conversation
  const [agoraLocalUserInfo, setAgoraLocalUserInfo] =
    useState<AgoraLocalUserInfo | null>(null);

  const handleStartConversation = async () => {
    setIsLoading(true);
    setError(null);

    try {
      // Request a token from our API
      console.log('Fetching Agora token...');
      const agoraResponse = await fetch('/api/generate-agora-token');

      if (!agoraResponse.ok) {
        throw new Error('Failed to generate Agora token');
      }

      const responseData = await agoraResponse.json();
      console.log('Token response:', responseData);

      // Store the token data for the conversation
      setAgoraLocalUserInfo(responseData);

      // Show the conversation component
      setShowConversation(true);
    } catch (err) {
      setError('Failed to start conversation. Please try again.');
      console.error('Error starting conversation:', err);
    } finally {
      setIsLoading(false);
    }
  };

  const handleTokenWillExpire = async (uid: string) => {
    try {
      // Request a new token using the channel name and uid
      const response = await fetch(
        `/api/generate-agora-token?channel=${agoraLocalUserInfo?.channel}&uid=${uid}`
      );
      const data = await response.json();

      if (!response.ok) {
        throw new Error('Failed to generate new token');
      }

      return data.token;
    } catch (error) {
      console.error('Error renewing token:', error);
      throw error;
    }
  };

  return (
    <div className="min-h-screen bg-gray-900 text-white p-4">
      <div className="max-w-4xl mx-auto">
        <h1 className="text-4xl font-bold mb-6 text-center">
          Agora Conversational AI
        </h1>

        <p className="text-lg mb-6 text-center">
          When was the last time you had an intelligent conversation?
        </p>

        {!showConversation ? (
          <div className="flex justify-center mb-8">
            <button
              onClick={handleStartConversation}
              disabled={isLoading}
              className="px-6 py-3 bg-blue-600 text-white rounded-lg disabled:opacity-50"
            >
              {isLoading ? 'Starting...' : 'Start Conversation'}
            </button>
          </div>
        ) : agoraLocalUserInfo ? (
          <Suspense
            fallback={<p className="text-center">Loading conversation...</p>}
          >
            <AgoraProvider>
              <ConversationComponent
                agoraLocalUserInfo={agoraLocalUserInfo}
                onTokenWillExpire={handleTokenWillExpire}
                onEndConversation={() => setShowConversation(false)}
              />
            </AgoraProvider>
          </Suspense>
        ) : (
          <p className="text-center text-red-400">
            Failed to load conversation data.
          </p>
        )}

        {error && <p className="text-center text-red-400 mt-4">{error}</p>}
      </div>
    </div>
  );
}

现在不要担心 ConversationComponent 上的任何错误或警告,我们将在下一步中修复它们。

更新 Conversation 组件以使用 Token

现在我们有了 token 和 channel name,让我们创建一些 props,以便我们可以将它们从 LandingPage 传递到 ConversationComponent

打开 types/conversation.ts 文件并添加以下 interface

// 我们的 conversation 组件的 Props
export interface ConversationComponentProps {
  agoraLocalUserInfo: AgoraLocalUserInfo;
  onTokenWillExpire: (uid: string) => Promise<string>;
  onEndConversation: () => void;
}

打开 ConversationComponent.tsx 文件并进行更新,以导入和使用我们刚刚创建的 props 来加入频道。我们还将添加 token 过期事件处理程序来处理 token 续订,以及一个按钮来离开对话。

// Previopus imports remain the same as before...
import type { ConversationComponentProps } from '../types/conversation'; // 导入新的 props

// 更新组件以接受新的 props
export default function ConversationComponent({
  agoraLocalUserInfo,
  onTokenWillExpire,
  onEndConversation,
}: ConversationComponentProps) {
  // The previous declarations remain the same as before
  const [joinedUID, setJoinedUID] = useState<UID>(0); // New: After joining the channel we'll store the uid for renewing the token

// 更新 useJoin hook 以使用来自 props 的 token 和 channel name
  const { isConnected: joinSuccess } = useJoin(
    {
      appid: process.env.NEXT_PUBLIC_AGORA_APP_ID!,
      channel: agoraLocalUserInfo.channel, // 使用从 token 响应中收到的 channel name
      token: agoraLocalUserInfo.token, // 使用我们收到的 token
      uid: parseInt(agoraLocalUserInfo.uid), // 使用 uid 0 加入频道,因此 Agora 的系统将为我们创建并返回一个 uid
    },
    true
  );

// 一旦用户加入频道,将 actualUID 设置为 Agora 生成的 uid
  useEffect(() => {
    if (joinSuccess && client) {
      const uid = client.uid;
      setJoinedUID(uid as UID);
      console.log('Join successful, using UID:', uid);
    }
  }, [joinSuccess, client]);

/*
  Existing code remains the same as before:
  // Publish local microphone track
  // Handle remote user events
  // Handle remote user left event
*/

// New: Add listener for connection state changes
  useClientEvent(client, 'connection-state-change', (curState, prevState) => {
    console.log(`Connection state changed from ${prevState} to ${curState}`);
  });

// 添加 token 续订处理程序以避免断开连接
  const handleTokenWillExpire = useCallback(async () => {
    if (!onTokenWillExpire || !joinedUID) return;
    try {
      // 从我们的 API 请求一个新的 token
      const newToken = await onTokenWillExpire(joinedUID.toString());
      await client?.renewToken(newToken);
      console.log('Successfully renewed Agora token');
    } catch (error) {
      console.error('Failed to renew Agora token:', error);
    }
  }, [client, onTokenWillExpire, joinedUID]);

// New: Add listener for token privilege will expire event
  useClientEvent(client, 'token-privilege-will-expire', handleTokenWillExpire);

/*
  Existing code remains the same as before:
  // Toggle microphone
  // Cleanup on unmount
*/

//update the return statement to include new UI elements for leaving the conversation
  return (
    <div className="p-4 bg-gray-800 rounded-lg">
      <div className="flex items-center justify-between mb-4">
        <div className="flex items-center gap-2">
          <div
            className={`w-3 h-3 rounded-full ${
              isConnected ? 'bg-green-500' : 'bg-red-500'
            }`}
          />
          <span className="text-white">
            {isConnected ? 'Connected' : 'Disconnected'}
          </span>
        </div>

        <button
          onClick={onEndConversation}
          className="px-4 py-2 bg-red-500 text-white rounded"
        >
          End Conversation
        </button>
      </div>

      {/* Display remote users */}
      <div className="mb-4">
        <h2 className="text-xl mb-2 text-white">Remote Users:</h2>
        {remoteUsers.length > 0 ? (
          remoteUsers.map((user) => (
            <div
              key={user.uid}
              className="p-2 bg-gray-700 rounded mb-2 text-white"
            >
              <RemoteUser user={user} />
            </div>
          ))
        ) : (
          <p className="text-gray-400">No remote users connected</p>
        )}
      </div>

      {/* Microphone control */}
      <button
        onClick={toggleMicrophone}
        className={`px-4 py-2 rounded ${
          isEnabled ? 'bg-green-500' : 'bg-red-500'
        } text-white`}
      >
        Microphone: {isEnabled ? 'On' : 'Off'}
      </button>
    </div>
  );
}

快速测试

现在我们有了基本的 RTC 功能和 token 生成,让我们测试一下应用程序。

  1. 使用 pnpm run dev 运行应用程序
  2. 在浏览器中打开应用程序,使用网址 http://localhost:3000
  3. 点击 “Start Conversation” 按钮
  4. 你应该看到连接状态变为 “Connected”

添加 Agora 的会话 AI 引擎

现在我们有了基本的 RTC 功能,让我们集成 Agora 的会话 AI 服务。在下一节中,我们将:

  1. 创建一个 API 路由,用于邀请 AI 代理加入我们的频道
  2. 配置 Agora Start Request,包括我们选择的 LLM 端点和 TTS 提供程序
  3. 创建一个用于停止对话的路由

类型设置

让我们先处理一些枯燥的事情。在 types/conversation.ts 文件中添加一些新类型:

// Previous types remain the same as before...

// New types for the agent invitation API
export interface ClientStartRequest {
  requester_id: string;
  channel_name: string;
  rtc_codec?: number;
  input_modalities?: string[];
  output_modalities?: string[];
}

interface MicrosoftTTSParams {
  key: string;
  region: string;
  voice_name: string;
  rate?: number;
  volume?: number;
}

interface ElevenLabsTTSParams {
  key: string;
  voice_id: string;
  model_id: string;
}

export enum TTSVendor {
  Microsoft = 'microsoft',
  ElevenLabs = 'elevenlabs',
}

export interface TTSConfig {
  vendor: TTSVendor;
  params: MicrosoftTTSParams | ElevenLabsTTSParams;
}

// Agora API request body
export interface AgoraStartRequest {
  name: string;
  properties: {
    channel: string;
    token: string;
    agent_rtc_uid: string;
    remote_rtc_uids: string[];
    enable_string_uid?: boolean;
    idle_timeout?: number;
    advanced_features?: {
      enable_aivad?: boolean;
      enable_bhvs?: boolean;
    };
    asr: {
      language: string;
      task?: string;
    };
    llm: {
      url?: string;
      api_key?: string;
      system_messages: Array<{
        role: string;
        content: string;
      }>;
      greeting_message: string;
      failure_message: string;
      max_history?: number;
      input_modalities?: string[];
      output_modalities?: string[];
      params: {
        model: string;
        max_tokens: number;
        temperature?: number;
        top_p?: number;
      };
    };
    vad: {
      silence_duration_ms: number;
      speech_duration_ms?: number;
      threshold?: number;
      interrupt_duration_ms?: number;
      prefix_padding_ms?: number;
    };
    tts: TTSConfig;
  };
}

export interface StopConversationRequest {
  agent_id: string;
}

export interface AgentResponse {
  agent_id: string;
  create_ts: number;
  state: string;
}

这些新类型让我们对接下来要组装的所有部分有了一些了解。我们将获取客户端请求,并使用它来配置 AgoraStartRequest,然后将其发送到 Agora 的对话式 AI 引擎。Agora 的 Convo AI 引擎会将代理添加到对话中。

邀请 Agent 路由

app/api/invite-agent/route.ts 中创建路由文件:

mkdir app/api/invite-agent
touch app/api/invite-agent/route.ts

添加以下代码:

import { NextResponse } from 'next/server';
import { RtcTokenBuilder, RtcRole } from 'agora-token';
import {
  ClientStartRequest,
  AgentResponse,
  TTSVendor,
} from '@/types/conversation';

// Helper function to validate and get all configuration
function getValidatedConfig() {
  // Validate Agora Configuration
  const agoraConfig = {
    baseUrl: process.env.NEXT_PUBLIC_AGORA_CONVO_AI_BASE_URL || '',
    appId: process.env.NEXT_PUBLIC_AGORA_APP_ID || '',
    appCertificate: process.env.NEXT_PUBLIC_AGORA_APP_CERTIFICATE || '',
    customerId: process.env.NEXT_PUBLIC_AGORA_CUSTOMER_ID || '',
    customerSecret: process.env.NEXT_PUBLIC_AGORA_CUSTOMER_SECRET || '',
    agentUid: process.env.NEXT_PUBLIC_AGENT_UID || 'Agent',
  };

  if (Object.values(agoraConfig).some((v) => v === '')) {
    throw new Error('Missing Agora configuration. Check your .env.local file');
  }

  // Validate LLM Configuration
  const llmConfig = {
    url: process.env.NEXT_PUBLIC_LLM_URL,
    api_key: process.env.NEXT_PUBLIC_LLM_API_KEY,
    model: process.env.NEXT_PUBLIC_LLM_MODEL,
  };

  // Get TTS Vendor
  const ttsVendor =
    (process.env.NEXT_PUBLIC_TTS_VENDOR as TTSVendor) || TTSVendor.Microsoft;

  // Get Modalities Configuration
  const modalitiesConfig = {
    input: process.env.NEXT_PUBLIC_INPUT_MODALITIES?.split(',') || ['text'],
    output: process.env.NEXT_PUBLIC_OUTPUT_MODALITIES?.split(',') || [
      'text',
      'audio',
    ],
  };

  return {
    agora: agoraConfig,
    llm: llmConfig,
    ttsVendor,
    modalities: modalitiesConfig,
  };
}

// Helper function to get TTS configuration based on vendor
function getTTSConfig(vendor: TTSVendor) {
  if (vendor === TTSVendor.Microsoft) {
    return {
      vendor: TTSVendor.Microsoft,
      params: {
        key: process.env.NEXT_PUBLIC_MICROSOFT_TTS_KEY,
        region: process.env.NEXT_PUBLIC_MICROSOFT_TTS_REGION,
        voice_name:
          process.env.NEXT_PUBLIC_MICROSOFT_TTS_VOICE_NAME ||
          'en-US-AriaNeural',
        rate: parseFloat(process.env.NEXT_PUBLIC_MICROSOFT_TTS_RATE || '1.0'),
        volume: parseFloat(
          process.env.NEXT_PUBLIC_MICROSOFT_TTS_VOLUME || '100.0'
        ),
      },
    };
  } else if (vendor === TTSVendor.ElevenLabs) {
    return {
      vendor: TTSVendor.ElevenLabs,
      params: {
        key: process.env.NEXT_PUBLIC_ELEVENLABS_API_KEY,
        model_id: process.env.NEXT_PUBLIC_ELEVENLABS_MODEL_ID,
        voice_id: process.env.NEXT_PUBLIC_ELEVENLABS_VOICE_ID,
      },
    };
  }

  throw new Error(`Unsupported TTS vendor: ${vendor}`);
}

export async function POST(request: Request) {
  try {
    // Get our configuration
    const config = getValidatedConfig();
    const body: ClientStartRequest = await request.json();
    const { requester_id, channel_name, input_modalities, output_modalities } =
      body;

    // Generate a unique token for the AI agent
    const timestamp = Date.now();
    const expirationTime = Math.floor(timestamp / 1000) + 3600;

    const token = RtcTokenBuilder.buildTokenWithUid(
      config.agora.appId,
      config.agora.appCertificate,
      channel_name,
      config.agora.agentUid,
      RtcRole.PUBLISHER,
      expirationTime,
      expirationTime
    );

    // Check if we're using string UIDs
    const isStringUID = (str: string) => /[a-zA-Z]/.test(str);

    // Create a descriptive name for this conversation
    const uniqueName = `conversation-${timestamp}-${Math.random()
      .toString(36)
      .substring(2, 8)}`;

    // Get the appropriate TTS configuration
    const ttsConfig = getTTSConfig(config.ttsVendor);

    // Prepare the request to the Agora Conversational AI API
    const requestBody = {
      name: uniqueName,
      properties: {
        channel: channel_name,
        token: token,
        agent_rtc_uid: config.agora.agentUid,
        remote_rtc_uids: [requester_id],
        enable_string_uid: isStringUID(config.agora.agentUid),
        idle_timeout: 30,
        // ASR (Automatic Speech Recognition) settings
        asr: {
          language: 'en-US',
          task: 'conversation',
        },
        // LLM (Large Language Model) settings
        llm: {
          url: config.llm.url,
          api_key: config.llm.api_key,
          system_messages: [
            {
              role: 'system',
              content:
                'You are a helpful assistant. Respond concisely and naturally as if in a spoken conversation.',
            },
          ],
          greeting_message: 'Hello! How can I assist you today?',
          failure_message: 'Please wait a moment while I process that.',
          max_history: 10,
          params: {
            model: config.llm.model || 'gpt-3.5-turbo',
            max_tokens: 1024,
            temperature: 0.7,
            top_p: 0.95,
          },
          input_modalities: input_modalities || config.modalities.input,
          output_modalities: output_modalities || config.modalities.output,
        },
        // VAD (Voice Activity Detection) settings
        vad: {
          silence_duration_ms: 480,
          speech_duration_ms: 15000,
          threshold: 0.5,
          interrupt_duration_ms: 160,
          prefix_padding_ms: 300,
        },
        // TTS (Text-to-Speech) settings
        tts: ttsConfig,
      },
    };

    // Send the request to the Agora API
    const response = await fetch(
      `${config.agora.baseUrl}/${config.agora.appId}/join`,
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Basic ${Buffer.from(
            `${config.agora.customerId}:${config.agora.customerSecret}`
          ).toString('base64')}`,
        },
        body: JSON.stringify(requestBody),
      }
    );

    if (!response.ok) {
      const errorText = await response.text();
      console.error('Agent start response:', {
        status: response.status,
        body: errorText,
      });
      throw new Error(
        `Failed to start conversation: ${response.status} ${errorText}`
      );
    }

    // Parse and return the response, which includes the agentID.
    // We'll need the agentID later, when its time to remove the agent.
    const data: AgentResponse = await response.json();
    return NextResponse.json(data);
  } catch (error) {
    console.error('Error starting conversation:', error);
    return NextResponse.json(
      {
        error:
          error instanceof Error
            ? error.message
            : 'Failed to start conversation',
      },
      { status: 500 }
    );
  }
}

由于 Agora 支持多个 TTS 提供商,因此 TTS 部分包括 Microsoft Azure TTS 和 ElevenLabs 的配置,并

選擇語音

您還需要選擇一個聲音。為了幫助您入門,以下是每個提供商的語音庫連結:

注意: 此方法加載了許多環境變數。請確保在您的 .env.local 檔案中設定這些變數。在本指南的末尾,我列出了您需要設定的所有環境變數。

停止会话路由

在 Agent 加入会话后,我们需要一种将其从会话中移除的方法。这就是 stop-conversation 路由的作用,它接受 agentID 并向 Agora 的 Conversational AI Engine 发送请求,以将 Agent 从频道中移除。

app/api/stop-conversation/route.ts 创建一个文件:

mkdir app/api/stop-conversation
touch app/api/stop-conversation/route.ts

添加以下代码:

import { NextResponse } from 'next/server';
import { StopConversationRequest } from '@/types/conversation';

// Helper function to validate and get Agora configuration
function getValidatedConfig() {
  const agoraConfig = {
    baseUrl: process.env.NEXT_PUBLIC_AGORA_CONVO_AI_BASE_URL,
    appId: process.env.NEXT_PUBLIC_AGORA_APP_ID || '',
    customerId: process.env.NEXT_PUBLIC_AGORA_CUSTOMER_ID || '',
    customerSecret: process.env.NEXT_PUBLIC_AGORA_CUSTOMER_SECRET || '',
  };

  if (Object.values(agoraConfig).some((v) => !v || v.trim() === '')) {
    throw new Error('Missing Agora configuration. Check your .env.local file');
  }

  return agoraConfig;
}

export async function POST(request: Request) {
  try {
    const config = getValidatedConfig();
    const body: StopConversationRequest = await request.json();
    const { agent_id } = body;

    if (!agent_id) {
      throw new Error('agent_id is required');
    }

    // Create authentication header
    const plainCredential = `${config.customerId}:${config.customerSecret}`;
    const encodedCredential = Buffer.from(plainCredential).toString('base64');
    const authorizationHeader = `Basic ${encodedCredential}`;

    // Send request to Agora API to stop the conversation
    const response = await fetch(
      `${config.baseUrl}/${config.appId}/agents/${agent_id}/leave`,
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: authorizationHeader,
        },
      }
    );

    if (!response.ok) {
      const errorText = await response.text();
      console.error('Agent stop response:', {
        status: response.status,
        body: errorText,
      });
      throw new Error(
        `Failed to stop conversation: ${response.status} ${errorText}`
      );
    }

    return NextResponse.json({ success: true });
  } catch (error) {
    console.error('Error stopping conversation:', error);
    return NextResponse.json(
      {
        error:
          error instanceof Error
            ? error.message
            : 'Failed to stop conversation',
      },
      { status: 500 }
    );
  }
}

更新客户端以启动和停止 AI Agent

我们将更新 LandingPageConversationComponent,以添加启动和停止 AI Agent 的功能。

更新登录页面以邀请 AI 代理

首先,让我们更新登录页面,在生成令牌后邀请 AI 代理。这将并行运行邀请请求和加载 ConversationComponent

// Previous imports remain the same as before...
// Add new imports for ClientStartRequest and AgentResponse
import type {
  AgoraLocalUserInfo,
  ClientStartRequest,
  AgentResponse,
} from '../types/conversation';

// Dynamically imports for ConversationComponent and AgoraProvider remain the same as before...

export default function LandingPage() {
  // previous state management code remains the same as before...
  const [agentJoinError, setAgentJoinError] = useState(false); // add agent join error state

  const handleStartConversation = async () => {
    setIsLoading(true);
    setError(null);
    setAgentJoinError(false);

    try {
      // Step 1: Get the Agora token (updated)
      console.log('Fetching Agora token...');
      const agoraResponse = await fetch('/api/generate-agora-token');
      const responseData = await agoraResponse.json();
      console.log('Agora API response:', responseData);

      if (!agoraResponse.ok) {
        throw new Error(
          `Failed to generate Agora token: ${JSON.stringify(responseData)}`
        );
      }

      // Step 2: Invite the AI agent to join the channel
      const startRequest: ClientStartRequest = {
        requester_id: responseData.uid,
        channel_name: responseData.channel,
        input_modalities: ['text'],
        output_modalities: ['text', 'audio'],
      };

      try {
        const response = await fetch('/api/invite-agent', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
          },
          body: JSON.stringify(startRequest),
        });

        if (!response.ok) {
          setAgentJoinError(true);
        } else {
          const agentData: AgentResponse = await response.json();
          // Store agent ID along with token data
          setAgoraLocalUserInfo({
            ...responseData,
            agentId: agentData.agent_id,
          });
        }
      } catch (err) {
        console.error('Failed to start conversation with agent:', err);
        setAgentJoinError(true);
      }

      // Show the conversation UI even if agent join fails
      // The user can retry connecting the agent from within the conversation
      setShowConversation(true);
    } catch (err) {
      setError('Failed to start conversation. Please try again.');
      console.error('Error starting conversation:', err);
    } finally {
      setIsLoading(false);
    }
  };

  // Token renewal code remains the same as before...

  // Updated return statement to show error if the agent join fails
  return (
    <div className="min-h-screen bg-gray-900 text-white p-4">
      <div className="max-w-4xl mx-auto py-12">
        <h1 className="text-4xl font-bold mb-6 text-center">
          Agora AI Conversation
        </h1>

        <p className="text-lg mb-6 text-center">
          When was the last time you had an intelligent conversation?
        </p>

        {!showConversation ? (
          <>
            <div className="flex justify-center mb-8">
              <button
                onClick={handleStartConversation}
                disabled={isLoading}
                className="px-8 py-3 bg-blue-600 hover:bg-blue-700 text-white rounded-full shadow-lg disabled:opacity-50 transition-all"
              >
                {isLoading ? 'Starting...' : 'Start Conversation'}
              </button>
            </div>
            {error && <p className="text-center text-red-400 mt-4">{error}</p>}
          </>
        ) : agoraLocalUserInfo ? (
          <>
            {agentJoinError && (
              <div className="mb-4 p-3 bg-red-600/20 rounded-lg text-red-400 text-center">
                Failed to connect with AI agent. The conversation may not work
                as expected.
              </div>
            )}
            <Suspense
              fallback={
                <div className="text-center">Loading conversation...</div>
              }
            >
              <AgoraProvider>
                <ConversationComponent
                  agoraLocalUserInfo={agoraLocalUserInfo}
                  onTokenWillExpire={handleTokenWillExpire}
                  onEndConversation={() => setShowConversation(false)}
                />
              </AgoraProvider>
            </Suspense>
          </>
        ) : (
          <p className="text-center">Failed to load conversation data.</p>
        )}
      </div>
    </div>
  );
}

这个更新后的登录页面现在邀请 AI 代理加入对话,并在代理无法加入时显示适当的加载和错误状态。

创建麦克风按钮组件

麦克风按钮是任何音频优先 UI 的一个基本元素。因此,我们将创建一个简单的按钮组件,允许用户控制他们的麦克风。

components/MicrophoneButton.tsx 中创建一个文件:

touch components/MicrophoneButton.tsx

添加以下代码:

'use client';

import React, { useState } from 'react';
import { IMicrophoneAudioTrack } from 'agora-rtc-react';
import { Mic, MicOff } from 'lucide-react'; // Import from lucide-react or another icon library

interface MicrophoneButtonProps {
  isEnabled: boolean;
  setIsEnabled: (enabled: boolean) => void;
  localMicrophoneTrack: IMicrophoneAudioTrack | null;
}

export function MicrophoneButton({
  isEnabled,
  setIsEnabled,
  localMicrophoneTrack,
}: MicrophoneButtonProps) {
  const toggleMicrophone = async () => {
    if (localMicrophoneTrack) {
      const newState = !isEnabled;
      try {
        await localMicrophoneTrack.setEnabled(newState);
        setIsEnabled(newState);
        console.log('Microphone state updated successfully');
      } catch (error) {
        console.error('Failed to toggle microphone:', error);
      }
    }
  };

  return (
    <button
      onClick={toggleMicrophone}
      className={`relative w-16 h-16 rounded-full shadow-lg flex items-center justify-center transition-colors ${
        isEnabled ? 'bg-white hover:bg-gray-50' : 'bg-red-500 hover:bg-red-600'
      }`}
      aria-label={isEnabled ? 'Mute microphone' : 'Unmute microphone'}
    >
      <div className={`relative z-10`}>
        {isEnabled ? (
          <Mic size={24} className="text-gray-800" />
        ) : (
          <MicOff size={24} className="text-white" />
        )}
      </div>
    </button>
  );
}

更新会话组件

现在,让我们更新我们的会话组件,以处理停止和重启 AI 代理。我们还将添加麦克风按钮组件:

// Previous imports remain the same as before...
import { MicrophoneButton } from './MicrophoneButton'; // microphone button component
// import new ClientStartRequest and StopConversationRequest types
import type {
  ConversationComponentProps,
  ClientStartRequest,
  StopConversationRequest,
} from '../types/conversation';

export default function ConversationComponent({
  agoraLocalUserInfo,
  onTokenWillExpire,
  onEndConversation,
}: ConversationComponentProps) {
  // Previous state management code remains the same as before...
  // Add new agent related state variables
  const [isAgentConnected, setIsAgentConnected] = useState(false);
  const [isConnecting, setIsConnecting] = useState(false);
  const agentUID = process.env.NEXT_PUBLIC_AGENT_UID;

  // Join the channel hook remains the same as before...
  // Set UID on join success, remains the same as before...
  // Publish local microphone track remains the same as before...

  // Update remote user events - specifically looking for the AI agent
  useClientEvent(client, 'user-joined', (user) => {
    console.log('Remote user joined:', user.uid);
    if (user.uid.toString() === agentUID) {
      setIsAgentConnected(true);
      setIsConnecting(false);
    }
  });

  useClientEvent(client, 'user-left', (user) => {
    console.log('Remote user left:', user.uid);
    if (user.uid.toString() === agentUID) {
      setIsAgentConnected(false);
      setIsConnecting(false);
    }
  });

  // Sync isAgentConnected with remoteUsers
  useEffect(() => {
    const isAgentInRemoteUsers = remoteUsers.some(
      (user) => user.uid.toString() === agentUID
    );
    setIsAgentConnected(isAgentInRemoteUsers);
  }, [remoteUsers, agentUID]);

  // Connection state listener remains the same as before...
  // Cleanup on unmount remains the same as before...

  // Function to stop conversation with the AI agent
  const handleStopConversation = async () => {
    if (!isAgentConnected || !agoraLocalUserInfo.agentId) return;
    setIsConnecting(true);

    try {
      const stopRequest: StopConversationRequest = {
        agent_id: agoraLocalUserInfo.agentId,
      };

      const response = await fetch('/api/stop-conversation', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(stopRequest),
      });

      if (!response.ok) {
        throw new Error(`Failed to stop conversation: ${response.statusText}`);
      }

      // Wait for the agent to actually leave before resetting state
      // The user-left event handler will handle setting isAgentConnected to false
    } catch (error) {
      if (error instanceof Error) {
        console.warn('Error stopping conversation:', error.message);
      }
      setIsConnecting(false);
    }
  };

  // Function to start conversation with the AI agent
  const handleStartConversation = async () => {
    if (!joinedUID) return;
    setIsConnecting(true);

    try {
      const startRequest: ClientStartRequest = {
        requester_id: joinedUID.toString(),
        channel_name: agoraLocalUserInfo.channel,
        input_modalities: ['text'],
        output_modalities: ['text', 'audio'],
      };

      const response = await fetch('/api/invite-agent', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(startRequest),
      });

      if (!response.ok) {
        throw new Error(`Failed to start conversation: ${response.statusText}`);
      }

      // Update agent ID when new agent is connected
      const data = await response.json();
      if (data.agent_id) {
        agoraLocalUserInfo.agentId = data.agent_id;
      }
    } catch (error) {
      if (error instanceof Error) {
        console.warn('Error starting conversation:', error.message);
      }
      // Reset connecting state if there's an error
      setIsConnecting(false);
    }
  };

  // Token renewal handler remains the same as before...
  // Add token observer remains the same as before...

  // Updated return to include stop, restart, and microphone controls
  return (
    <div className="flex flex-col gap-6 p-4 h-full relative">
      {/* Connection Status */}
      <div className="absolute top-4 right-4 flex items-center gap-2">
        {isAgentConnected ? (
          <button
            onClick={handleStopConversation}
            disabled={isConnecting}
            className="px-4 py-2 bg-red-500/80 text-white rounded-full border border-red-400/30 backdrop-blur-sm
            hover:bg-red-600/90 transition-all shadow-lg
            disabled:opacity-50 disabled:cursor-not-allowed text-sm font-medium"
          >
            {isConnecting ? 'Disconnecting...' : 'Stop Agent'}
          </button>
        ) : (
          <button
            onClick={handleStartConversation}
            disabled={isConnecting}
            className="px-4 py-2 bg-blue-500/80 text-white rounded-full border border-blue-400/30 backdrop-blur-sm
            hover:bg-blue-600/90 transition-all shadow-lg
            disabled:opacity-50 disabled:cursor-not-allowed text-sm font-medium"
          >
            {isConnecting ? 'Connecting...' : 'Connect Agent'}
          </button>
        )}
        <div
          className={`w-3 h-3 rounded-full ${
            isConnected ? 'bg-green-500' : 'bg-red-500'
          }`}
          onClick={onEndConversation}
          role="button"
          title="End conversation"
          style={{ cursor: 'pointer' }}
        />
      </div>

      {/* Remote Users Section */}
      <div className="flex-1">
        {remoteUsers.map((user) => (
          <div key={user.uid} className="mb-4">
            <p className="text-center text-sm text-gray-400 mb-2">
              {user.uid.toString() === agentUID
                ? 'AI Agent'
                : `User: ${user.uid}`}
            </p>
            <RemoteUser user={user} />
          </div>
        ))}

        {remoteUsers.length === 0 && (
          <div className="text-center text-gray-500 py-8">
            {isConnected
              ? 'Waiting for AI agent to join...'
              : 'Connecting to channel...'}
          </div>
        )}
      </div>

      {/* Microphone Control */}
      <div className="fixed bottom-8 left-1/2 -translate-x-1/2">
        <MicrophoneButton
          isEnabled={isEnabled}
          setIsEnabled={setIsEnabled}
          localMicrophoneTrack={localMicrophoneTrack}
        />
      </div>
    </div>
  );
}

音频可视化

让我们添加一个音频可视化,以便在 AI 代理说话时给用户提供视觉反馈。 这是一个音频可视化组件的示例,它将 Agora 音频轨道作为动画的输入。

components/AudioVisualizer.tsx 创建一个文件:

touch components/AudioVisualizer.tsx

添加以下代码:

'use client';

import React, { useEffect, useRef, useState } from 'react';
import { ILocalAudioTrack, IRemoteAudioTrack } from 'agora-rtc-react';

interface AudioVisualizerProps {
  track: ILocalAudioTrack | IRemoteAudioTrack | undefined;
}

export const AudioVisualizer: React.FC<AudioVisualizerProps> = ({ track }) => {
  const [isVisualizing, setIsVisualizing] = useState(false);
  const audioContextRef = useRef<AudioContext | null>(null);
  const analyserRef = useRef<AnalyserNode | null>(null);
  const animationFrameRef = useRef<number>();
  const barsRef = useRef<(HTMLDivElement | null)[]>([]);

  const animate = () => {
    if (!analyserRef.current) {
      return;
    }

    const bufferLength = analyserRef.current.frequencyBinCount;
    const dataArray = new Uint8Array(bufferLength);
    analyserRef.current.getByteFrequencyData(dataArray);

    // Define frequency ranges for different bars to create a more appealing visualization
    const frequencyRanges = [
      [24, 31], // Highest (bar 0, 8)
      [16, 23], // Mid-high (bar 1, 7)
      [8, 15], // Mid (bar 2, 6)
      [4, 7], // Low-mid (bar 3, 5)
      [0, 3], // Lowest (bar 4 - center)
    ];

    barsRef.current.forEach((bar, index) => {
      if (!bar) {
        return;
      }

      // Use symmetrical ranges for the 9 bars
      const rangeIndex = index < 5 ? index : 8 - index;
      const [start, end] = frequencyRanges[rangeIndex];

      // Calculate average energy in this frequency range
      let sum = 0;
      for (let i = start; i <= end; i++) {
        sum += dataArray[i];
      }
      let average = sum / (end - start + 1);

      // Apply different multipliers to create a more appealing shape
      const multipliers = [0.7, 0.8, 0.85, 0.9, 0.95];
      const multiplierIndex = index < 5 ? index : 8 - index;
      average *= multipliers[multiplierIndex];

      // Scale and limit the height
      const height = Math.min((average / 255) * 100, 100);
      bar.style.height = `${height}px`;
    });

    animationFrameRef.current = requestAnimationFrame(animate);
  };

  useEffect(() => {
    if (!track) {
      return;
    }

    const startVisualizer = async () => {
      try {
        audioContextRef.current = new AudioContext();
        analyserRef.current = audioContextRef.current.createAnalyser();
        analyserRef.current.fftSize = 64; // Keep this small for performance

        // Get the audio track from Agora
        const mediaStreamTrack = track.getMediaStreamTrack();
        const stream = new MediaStream([mediaStreamTrack]);

        // Connect it to our analyzer
        const source = audioContextRef.current.createMediaStreamSource(stream);
        source.connect(analyserRef.current);

        setIsVisualizing(true);
        animate();
      } catch (error) {
        console.error('Error starting visualizer:', error);
      }
    };

    startVisualizer();

    // Clean up when component unmounts or track changes
    return () => {
      if (animationFrameRef.current) {
        cancelAnimationFrame(animationFrameRef.current);
      }
      if (audioContextRef.current) {
        audioContextRef.current.close();
      }
    };
  }, [track]);

  return (
    <div className="w-full h-40 rounded-lg overflow-hidden flex items-center justify-center relative">
      <div className="flex items-center space-x-2 h-[100px] relative z-10">
        {/* Create 9 bars for the visualizer */}
        {[...Array(9)].map((_, index) => (
          <div
            key={index}
            ref={(el) => {
              barsRef.current[index] = el;
            }}
            className="w-3 bg-gradient-to-t from-blue-500 via-purple-500 to-pink-500 rounded-full transition-all duration-75"
            style={{
              height: '2px',
              minHeight: '2px',
              background: 'linear-gradient(to top, #3b82f6, #8b5cf6, #ec4899)',
            }}
          />
        ))}
      </div>
    </div>
  );
};

该可视化器的工作原理是:

  1. 通过 track 属性从 Agora SDK 获取音频轨道
  2. 使用 Web Audio API 从音频流中提取频率数据
  3. 渲染对音频中不同频率范围做出响应的视觉条

要将此可视化器与远程用户的音频轨道一起使用,我们需要更新在 ConversationComponent 中渲染的 RemoteUser

集成音频可视化器

要将音频可视化器与我们的对话组件完全集成,我们需要:

  1. 导入 AudioVisualizer 组件
  2. 将适当的音频轨道传递给它
  3. 将其放置在我们的 UI 中

更新你的 ConversationComponent.tsx 以包含音频可视化器:

'use client';

import { useState, useEffect, useCallback } from 'react';
import {
  useRTCClient,
  useLocalMicrophoneTrack,
  useRemoteUsers,
  useClientEvent,
  useIsConnected,
  useJoin,
  usePublish,
  RemoteUser,
  UID,
} from 'agora-rtc-react';
import { MicrophoneButton } from './MicrophoneButton';
import { AudioVisualizer } from './AudioVisualizer';
import type {
  ConversationComponentProps,
  ClientStartRequest,
  StopConversationRequest,
} from '../types/conversation';

// Rest of the component as before...

// Then in the render method:
return (
  <div className="flex flex-col gap-6 p-4 h-full relative">
    {/* Connection Status */}
    {/* ... */}

    {/* Remote Users Section with Audio Visualizer */}
    <div className="flex-1">
      {remoteUsers.map((user) => (
        <div key={user.uid} className="mb-8 p-4 bg-gray-800/30 rounded-lg">
          <p className="text-center text-sm text-gray-400 mb-2">
            {user.uid.toString() === agentUID
              ? 'AI Agent'
              : `User: ${user.uid}`}
          </p>

          {/* The AudioVisualizer receives the remote user's audio track */}
          <AudioVisualizer track={user.audioTrack} />

          {/* The RemoteUser component handles playing the audio */}
          <RemoteUser user={user} />
        </div>
      ))}

      {remoteUsers.length === 0 && (
        <div className="text-center text-gray-500 py-8">
          {isConnected
            ? 'Waiting for AI agent to join...'
            : 'Connecting to channel...'}
        </div>
      )}
    </div>

    {/* Microphone Control */}
    <div className="fixed bottom-8 left-1/2 -translate-x-1/2">
      <MicrophoneButton
        isEnabled={isEnabled}
        setIsEnabled={setIsEnabled}
        localMicrophoneTrack={localMicrophoneTrack}
      />
    </div>
  </div>
);

这创建了一个响应式可视化,可以清楚地表明 AI 代理何时在说话,通过音频旁的视觉反馈来改善用户体验。

使用可视化增强麦克风按钮

由于我们只有一个用户和一个 AI 在频道中,我们也应该更新我们的麦克风按钮,以包含其自己的音频可视化。这为用户提供了麦克风正在捕获音频输入的视觉反馈。让我们创建一个更复杂的 MicrophoneButton.tsx 版本:

'use client';

import React, { useState, useEffect, useRef } from 'react';
import { useRTCClient, IMicrophoneAudioTrack } from 'agora-rtc-react';
import { Mic, MicOff } from 'lucide-react';

// Interface for audio bar data
interface AudioBar {
  height: number;
}

interface MicrophoneButtonProps {
  isEnabled: boolean;
  setIsEnabled: (enabled: boolean) => void;
  localMicrophoneTrack: IMicrophoneAudioTrack | null;
}

export function MicrophoneButton({
  isEnabled,
  setIsEnabled,
  localMicrophoneTrack,
}: MicrophoneButtonProps) {
  // State to store audio visualization data
  const [audioData, setAudioData] = useState<AudioBar[]>(
    Array(5).fill({ height: 0 })
  );

  // Get the Agora client from context
  const client = useRTCClient();

  // References for audio processing
  const audioContextRef = useRef<AudioContext | null>(null);
  const analyserRef = useRef<AnalyserNode | null>(null);
  const animationFrameRef = useRef<number>();

  // Set up and clean up audio analyzer based on microphone state
  useEffect(() => {
    if (localMicrophoneTrack && isEnabled) {
      setupAudioAnalyser();
    } else {
      cleanupAudioAnalyser();
    }

    return () => cleanupAudioAnalyser();
  }, [localMicrophoneTrack, isEnabled]);

  // Initialize the audio analyzer
  const setupAudioAnalyser = async () => {
    if (!localMicrophoneTrack) return;

    try {
      // Create audio context and analyzer
      audioContextRef.current = new AudioContext();
      analyserRef.current = audioContextRef.current.createAnalyser();
      analyserRef.current.fftSize = 64; // Small FFT size for better performance
      analyserRef.current.smoothingTimeConstant = 0.5; // Add smoothing

      // Get the microphone stream from Agora
      const mediaStream = localMicrophoneTrack.getMediaStreamTrack();
      const source = audioContextRef.current.createMediaStreamSource(
        new MediaStream([mediaStream])
      );

      // Connect the source to the analyzer
      source.connect(analyserRef.current);

      // Start updating the visualization
      updateAudioData();
    } catch (error) {
      console.error('Error setting up audio analyser:', error);
    }
  };

  // Clean up audio resources
  const cleanupAudioAnalyser = () => {
    if (animationFrameRef.current) {
      cancelAnimationFrame(animationFrameRef.current);
    }
    if (audioContextRef.current) {
      audioContextRef.current.close();
      audioContextRef.current = null;
    }
    setAudioData(Array(5).fill({ height: 0 }));
  };

  // Update the audio visualization data
  const updateAudioData = () => {
    if (!analyserRef.current) return;

    // Get frequency data from analyzer
    const dataArray = new Uint8Array(analyserRef.current.frequencyBinCount);
    analyserRef.current.getByteFrequencyData(dataArray);

    // Split the frequency data into 5 segments
    const segmentSize = Math.floor(dataArray.length / 5);
    const newAudioData = Array(5)
      .fill(0)
      .map((_, index) => {
        // Get average value for this frequency segment
        const start = index * segmentSize;
        const end = start + segmentSize;
        const segment = dataArray.slice(start, end);
        const average = segment.reduce((a, b) => a + b, 0) / segment.length;

        // Scale and shape the response curve for better visualization
        const scaledHeight = Math.min(60, (average / 255) * 100 * 1.2);
        const height = Math.pow(scaledHeight / 60, 0.7) * 60;

        return {
          height: height,
        };
      });

    // Update state with new data
    setAudioData(newAudioData);

    // Schedule the next update
    animationFrameRef.current = requestAnimationFrame(updateAudioData);
  };

  // Toggle microphone state
  const toggleMicrophone = async () => {
    if (localMicrophoneTrack) {
      const newState = !isEnabled;
      try {
        // Enable/disable the microphone track
        await localMicrophoneTrack.setEnabled(newState);

        // Handle publishing/unpublishing
        if (!newState) {
          await client.unpublish(localMicrophoneTrack);
        } else {
          await client.publish(localMicrophoneTrack);
        }

        // Update state
        setIsEnabled(newState);
        console.log('Microphone state updated successfully');
      } catch (error) {
        console.error('Failed to toggle microphone:', error);
        // Revert to previous state on error
        localMicrophoneTrack.setEnabled(isEnabled);
      }
    }
  };

  return (
    <button
      onClick={toggleMicrophone}
      className={`relative w-16 h-16 rounded-full shadow-lg flex items-center justify-center transition-colors ${
        isEnabled ? 'bg-white hover:bg-gray-50' : 'bg-red-500 hover:bg-red-600'
      }`}
    >
      {/* Audio visualization bars */}
      <div className="absolute inset-0 flex items-center justify-center gap-1">
        {audioData.map((bar, index) => (
          <div
            key={index}
            className="w-1 rounded-full transition-all duration-100"
            style={{
              height: `${bar.height}%`,
              backgroundColor: isEnabled ? '#22c55e' : '#94a3b8',
              transform: `scaleY(${Math.max(0.1, bar.height / 100)})`,
              transformOrigin: 'center',
            }}
          />
        ))}
      </div>

      {/* Microphone icon overlaid on top */}
      <div className={`relative z-10`}>
        {isEnabled ? (
          <Mic size={24} className="text-gray-800" />
        ) : (
          <MicOff size={24} className="text-white" />
        )}
      </div>
    </button>
  );
}

带有音频可视化的麦克风按钮帮助用户理解:

  • 他们的麦克风是否正常工作
  • 当他们说话的声音足够大时才能被听到
  • 当背景噪音可能会影响他们的音频质量时

目标是为用户创建一个更直观、更具视觉吸引力的体验。

测试

现在我们已经准备好所有组件,让我们通过测试应用程序来完成。

启动开发服务器

要启动开发服务器:

npm run dev

注意:确保您的 .env 文件已正确配置了所有必要的凭据。本指南末尾提供了环境变量的完整列表。

如果您的应用程序运行正常,您应该会看到如下输出:

   ▲ Next.js 15.1.4
   - Local:        http://localhost:3000
   - Network:      http://192.168.0.102:3000
   - Environments: .env.local
   - Experiments (use with caution):
     · webpackBuildWorker
     · parallelServerCompiles
     · parallelServerBuildTraces

✓ Starting...
 ✓ Ready in 4.7s

在浏览器中打开 http://localhost:3000 并进行测试。

查看完整的源代码:

[## GitHub - AgoraIO-Community/conversational-ai-nextjs-client: A NextJS web-app that implements…

A NextJS web-app that implements Agora’s Conversational AI - AgoraIO-Community/conversational-ai-nextjs-client

github.com](https://github.com/AgoraIO-Community/conversational-ai-nextjs-client)

常见问题和解决方案

  • Agent 未加入:

    • 验证您的 Agora Conversational AI 凭据
    • 检查控制台是否有特定的错误消息
    • 确保您的 TTS 配置有效
  • 音频无法工作:

    • 检查浏览器对麦克风访问的权限
    • 验证麦克风是否在应用程序中启用
    • 检查音频轨道是否已正确发布
  • Token 错误:

    • 验证 App ID 和 App Certificate 是否正确
    • 确保 token 续订逻辑正在运行
    • 检查 token 相关函数中是否有正确的错误处理
  • 频道连接问题:

    • 检查网络连接
    • 验证 Agora 服务状态
    • 确保在离开频道时进行适当的清理

自定义

Agora Conversational AI Engine 支持许多自定义。

自定义 Agent

/agent/invite 端点中,system_message 决定了 AI agent 的响应方式,赋予它特定的个性和沟通风格。

修改 system_message 以自定义 agent 的提示:

// In app/api/invite-agent/route.ts
system_messages: [
  {
    role: 'system',
    content:
      'You are a friendly and helpful assistant named Alex. Your personality is warm, patient, and slightly humorous. When speaking, use a conversational tone with occasional casual expressions. Your responses should be concise but informative, aimed at making complex topics accessible to everyone. If you don't know something, admit it honestly rather than guessing. When appropriate, offer follow-up questions to help guide the conversation.',
  },
],

您还可以更新问候语来控制它在频道中说的初始消息。

llm {
    greeting_message: 'Hello! How can I assist you today?',
    failure_message: 'Please wait a moment.',
}

自定义语音

通过探索语音库为您的应用程序选择合适的语音:

微调语音活动检测

调整 VAD 设置以优化对话流程:

// In app/api/invite-agent/route.ts
vad: {
  silence_duration_ms: 600,      // How long to wait after silence to end turn (Increase for longer pauses before next turns)
  speech_duration_ms: 10000,     // Maximum duration for a single speech segment (force end of turn after this time)
  threshold: 0.6,                // Sensitivity to background noise (Higher values require louder speech to trigger)
  interrupt_duration_ms: 200,    // How quickly interruptions are detected
  prefix_padding_ms: 400,        // How much audio to capture before speech is detected
},

环境变量参考

以下是 .env 文件的环境变量完整列表:

## Agora Configuration
NEXT_PUBLIC_AGORA_APP_ID=
NEXT_PUBLIC_AGORA_APP_CERTIFICATE=
NEXT_PUBLIC_AGORA_CUSTOMER_ID=
NEXT_PUBLIC_AGORA_CUSTOMER_SECRET=

NEXT_PUBLIC_AGORA_CONVO_AI_BASE_URL=https://api.agora.io/api/conversational-ai-agent/v2/projects/
NEXT_PUBLIC_AGENT_UID=333

## LLM Configuration
NEXT_PUBLIC_LLM_URL=https://api.openai.com/v1/chat/completions
NEXT_PUBLIC_LLM_MODEL=gpt-4
NEXT_PUBLIC_LLM_API_KEY=
## TTS Configuration
NEXT_PUBLIC_TTS_VENDOR=microsoft

## Text-to-Speech Configuration
NEXT_PUBLIC_MICROSOFT_TTS_KEY=
NEXT_PUBLIC_MICROSOFT_TTS_REGION=eastus
NEXT_PUBLIC_MICROSOFT_TTS_VOICE_NAME=en-US-AndrewMultilingualNeural
NEXT_PUBLIC_MICROSOFT_TTS_RATE=1.1
NEXT_PUBLIC_MICROSOFT_TTS_VOLUME=70

## ElevenLabs Configuration
NEXT_PUBLIC_ELEVENLABS_API_KEY=
NEXT_PUBLIC_ELEVENLABS_VOICE_ID=XrExE9yKIg1WjnnlVkGX
NEXT_PUBLIC_ELEVENLABS_MODEL_ID=eleven_flash_v2_5

## Modalities Configuration
NEXT_PUBLIC_INPUT_MODALITIES=text
NEXT_PUBLIC_OUTPUT_MODALITIES=text,audio

下一步

恭喜!您已经构建了一个与 Agora 的 Conversational AI Engine 集成的 Express 服务器。将此微服务与您现有的 Agora 后端集成。

有关 Agora 的 Convesational AI Engine 的更多信息,请查看官方文档

祝您构建愉快!

Related Posts

结合chatgpt-o3-mini与perplexity Deep Research的3步提示:提升论文写作质量的终极指南

结合chatgpt-o3-mini与perplexity Deep Research的3步提示:提升论文写作质量的终极指南

AI 研究报告和论文写作 合并两个系统指令以获得两个模型的最佳效果 Perplexity AI 的 Deep Research 工具提供专家级的研究报告,而 OpenAI 的 ChatGPT-o3-mini-high 擅长推理。我发现你可以将它们结合起来生成令人难以置信的论文,这些论文比任何一个模型单独撰写的都要好。你只需要将这个一次性提示复制到 **

阅读更多
让 Excel 过时的 10 种 Ai 工具:实现数据分析自动化,节省手工作业时间

让 Excel 过时的 10 种 Ai 工具:实现数据分析自动化,节省手工作业时间

Non members click here作为一名软件开发人员,多年来的一个发现总是让我感到惊讶,那就是人们还在 Excel

阅读更多
使用 ChatGPT 搜索网络功能的 10 种创意方法

使用 ChatGPT 搜索网络功能的 10 种创意方法

例如,提示和输出 你知道可以使用 ChatGPT 的“搜索网络”功能来完成许多任务,而不仅仅是基本的网络搜索吗? 对于那些不知道的人,ChatGPT 新的“搜索网络”功能提供实时信息。 截至撰写此帖时,该功能仅对使用 ChatGPT 4o 和 4o-mini 的付费会员开放。 ![](https://images.weserv.nl/?url=https://cdn-im

阅读更多
掌握Ai代理:解密Google革命性白皮书的10个关键问题解答

掌握Ai代理:解密Google革命性白皮书的10个关键问题解答

10 个常见问题解答 本文是我推出的一个名为“10 个常见问题解答”的新系列的一部分。在本系列中,我旨在通过回答关于该主题的十个最常见问题来分解复杂的概念。我的目标是使用简单的语言和相关的类比,使这些想法易于理解。 图片来自 [Solen Feyissa](https://unsplash.com/@solenfeyissa?utm_source=medium&utm_medi

阅读更多
在人工智能和技术领域保持领先地位的 10 项必学技能 📚

在人工智能和技术领域保持领先地位的 10 项必学技能 📚

在人工智能和科技这样一个动态的行业中,保持领先意味着不断提升你的技能。无论你是希望深入了解人工智能模型性能、掌握数据分析,还是希望通过人工智能转变传统领域如法律,这些课程都是你成功的捷径。以下是一个精心策划的高价值课程列表,可以助力你的职业发展,并让你始终处于创新的前沿。 1. 生成性人工智能简介课程: [生成性人工智能简介](https://genai.works

阅读更多
揭开真相!深度探悉DeepSeek AI的十大误区,您被误导了吗?

揭开真相!深度探悉DeepSeek AI的十大误区,您被误导了吗?

在AI军备竞赛中分辨事实与虚构 DeepSeek AI真的是它所宣传的游戏规则改变者,还是仅仅聪明的营销和战略炒作?👀 虽然一些人将其视为AI效率的革命性飞跃,但另一些人则认为它的成功建立在借用(甚至窃取的)创新和可疑的做法之上。传言称,DeepSeek的首席执行官在疫情期间像囤积卫生纸一样囤积Nvidia芯片——这只是冰山一角。 从其声称的550万美元培训预算到使用Open

阅读更多
Type something to search...