Skip to main content

Build a backend and client from scratch

This guide walks through building the full Conversational AI stack from scratch: a server that issues tokens and manages agent sessions, and a browser client that captures microphone audio and streams live transcripts. You will end up with the same structure as the official Next.js and Python starter repos, but with every step explained.

If you want to hear an agent speak in under five minutes, follow the Voice AI quickstart guide.

info

This project uses Agora-managed presets, so no vendor API keys are required to complete this tutorial. If you prefer to switch to your own vendor accounts, see the documentation for your chosen ASR, LLM, and TTS providers.

What you will build

A complete client-server app:

  • Backend: Three HTTP endpoints:

    • POST /api/token: Issues RTC + RTM tokens for a given channel and UID
    • POST /api/invite-agent: Starts the agent using the Agent Server SDK
    • POST /api/stop-conversation: Stops the agent by ID
  • Frontend: A Next.js page that:

    • Requests microphone permission
    • Joins the Agora channel using the RTC SDK
    • Fetches a token from your backend
    • Calls invite-agent to bring the agent into the channel
    • Renders live transcripts from RTM
    • Calls stop-conversation on page unload

The architecture looks like this:

The backend never touches audio, and the browser never embeds your App Certificate. This clean separation is the reason you need a backend.

Prerequisites

  • An active Agora account.
  • git and a terminal.
  • One of the following language runtimes:
    • Node.js 20 LTS or later with pnpm (TypeScript)
    • Python 3.11 or later with uv or pip
    • Go 1.22 or later
  • A modern browser with microphone access

Set up your environment

This section walks you through installing the Agora CLI and scaffolding your project.

Install the Agora CLI

The Agora CLI is the recommended way to bootstrap a new Agora project. Use it to create projects, enable features, write credentials to .env files, and run diagnostics.

To install the Agora CLI, log in, and create a project with Conversational AI enabled:


_4
npm install -g agoraio-cli
_4
agora login
_4
agora project create conv-ai-tutorial --feature rtc --feature convoai
_4
agora project use conv-ai-tutorial

Confirm that the CLI can read your credentials:


_1
agora project env --shell


_2
AGORA_APP_ID=your_app_id_here
_2
AGORA_APP_CERTIFICATE=your_certificate_here

Keep this terminal open. You will reuse these credentials in both the backend and frontend steps.

Scaffold the repo

Select the tab for your preferred language.

For TypeScript, the backend and frontend live in the same Next.js app.

  1. Scaffold a Next.js app with TypeScript, the App Router, Tailwind CSS, and ESLint, then install the required Agora packages:


    _3
    pnpm dlx create-next-app@latest conv-ai-tutorial --typescript --app --tailwind --eslint --src-dir=false --import-alias='@/*'
    _3
    cd conv-ai-tutorial
    _3
    pnpm add agora-agent-server-sdk agora-rtc-sdk-ng agora-rtm-sdk agora-token

    • agora-agent-server-sdk: Starts and stops agent sessions from the backend
    • agora-rtc-sdk-ng: Handles mic capture and RTC channel joining in the browser
    • agora-rtm-sdk: Receives live transcript messages from the agent
    • agora-token: Generates RTC and RTM tokens
  2. Write your Agora credentials to .env.local:


    _5
    eval "$(agora project env --shell)"
    _5
    cat > .env.local <<EOF
    _5
    NEXT_PUBLIC_AGORA_APP_ID=$AGORA_APP_ID
    _5
    NEXT_AGORA_APP_CERTIFICATE=$AGORA_APP_CERTIFICATE
    _5
    EOF

    The NEXT_PUBLIC_ prefix makes AGORA_APP_ID available in the browser, which is required to join the RTC channel. Never apply this prefix to APP_CERTIFICATE; it must remain server-side only.

Build the backend

The backend exposes three endpoints, one for each operation the frontend needs.

Generate tokens

Endpoint: POST /api/token

This endpoint builds an RTC token and an RTM token for the browser client. It is the only place the App Certificate is used.


_25
// app/api/token/route.ts
_25
import { NextRequest, NextResponse } from 'next/server';
_25
import { RtcTokenBuilder, RtcRole, RtmTokenBuilder } from 'agora-token';
_25
_25
const APP_ID = process.env.NEXT_PUBLIC_AGORA_APP_ID!;
_25
const APP_CERTIFICATE = process.env.NEXT_AGORA_APP_CERTIFICATE!;
_25
const TOKEN_TTL_SECONDS = 60 * 60; // 1 hour
_25
_25
export async function POST(req: NextRequest) {
_25
const { channel, uid } = await req.json();
_25
if (!channel || typeof uid !== 'number') {
_25
return NextResponse.json({ error: 'channel and numeric uid required' }, { status: 400 });
_25
}
_25
_25
const expireAt = Math.floor(Date.now() / 1000) + TOKEN_TTL_SECONDS;
_25
_25
const rtcToken = RtcTokenBuilder.buildTokenWithUid(
_25
APP_ID, APP_CERTIFICATE, channel, uid, RtcRole.PUBLISHER, expireAt, expireAt
_25
);
_25
const rtmToken = RtmTokenBuilder.buildToken(
_25
APP_ID, APP_CERTIFICATE, String(uid), expireAt
_25
);
_25
_25
return NextResponse.json({ rtcToken, rtmToken, expireAt });
_25
}

Start an agent session

Endpoint: POST /api/invite-agent

This endpoint uses the Agent Server SDK to configure an agent and start a session. The STT, LLM, and TTS configurations use Agora-managed presets and therefore do not require an apiKey.


_49
// app/api/invite-agent/route.ts
_49
import { NextRequest, NextResponse } from 'next/server';
_49
import {
_49
AgoraClient, Agent, Area, DeepgramSTT, MiniMaxTTS, OpenAI, ExpiresIn,
_49
} from 'agora-agent-server-sdk';
_49
_49
const client = new AgoraClient({
_49
area: Area.US,
_49
appId: process.env.NEXT_PUBLIC_AGORA_APP_ID!,
_49
appCertificate: process.env.NEXT_AGORA_APP_CERTIFICATE!,
_49
});
_49
_49
const AGENT_UID = 123456;
_49
_49
export async function POST(req: NextRequest) {
_49
const { channel } = await req.json();
_49
if (!channel) {
_49
return NextResponse.json({ error: 'channel required' }, { status: 400 });
_49
}
_49
_49
const agent = new Agent({
_49
name: 'support-agent',
_49
instructions: 'You are a friendly support agent for Acme Corp. Keep answers under 30 seconds.',
_49
greeting: 'Hi there! How can I help you today?',
_49
failureMessage: 'Sorry, I had trouble hearing that. Could you repeat?',
_49
maxHistory: 50,
_49
advancedFeatures: { enable_rtm: true, enable_tools: false },
_49
parameters: { data_channel: 'rtm', enable_error_message: true },
_49
})
_49
.withStt(new DeepgramSTT({ model: 'nova-3', language: 'en' }))
_49
.withLlm(new OpenAI({ model: 'gpt-4o-mini', maxHistory: 15 }))
_49
.withTts(new MiniMaxTTS({ model: 'speech_2_6_turbo', voiceId: 'English_captivating_female1' }));
_49
_49
const session = agent.createSession(client, {
_49
channel,
_49
agentUid: AGENT_UID,
_49
remoteUids: ['*'],
_49
idleTimeout: 30,
_49
expiresIn: ExpiresIn.hours(1),
_49
});
_49
_49
try {
_49
const { agentId } = await session.start();
_49
return NextResponse.json({ agentId, agentUid: AGENT_UID });
_49
} catch (err: unknown) {
_49
const message = err instanceof Error ? err.message : String(err);
_49
return NextResponse.json({ error: `start failed: ${message}` }, { status: 502 });
_49
}
_49
}

Stop an agent session

Endpoint: POST /api/stop-conversation

This endpoint stops a running agent session by ID.


_24
// app/api/stop-conversation/route.ts
_24
import { NextRequest, NextResponse } from 'next/server';
_24
import { AgoraClient, Area } from 'agora-agent-server-sdk';
_24
_24
const client = new AgoraClient({
_24
area: Area.US,
_24
appId: process.env.NEXT_PUBLIC_AGORA_APP_ID!,
_24
appCertificate: process.env.NEXT_AGORA_APP_CERTIFICATE!,
_24
});
_24
_24
export async function POST(req: NextRequest) {
_24
const { agentId } = await req.json();
_24
if (!agentId) {
_24
return NextResponse.json({ error: 'agentId required' }, { status: 400 });
_24
}
_24
_24
try {
_24
await client.agents.leave(agentId);
_24
return NextResponse.json({ stopped: true });
_24
} catch (err: unknown) {
_24
const message = err instanceof Error ? err.message : String(err);
_24
return NextResponse.json({ error: `stop failed: ${message}` }, { status: 502 });
_24
}
_24
}

Build the frontend

The frontend is the same Next.js app for all three backends. The only difference is whether it calls its own API routes (TypeScript) or a separate backend on port 8000 (Python and Go).

A basic API client

Create lib/api.ts to give the frontend a single place to manage the backend URL and endpoint calls.


_21
// lib/api.ts
_21
const BACKEND = process.env.NEXT_PUBLIC_BACKEND_URL ?? '';
_21
_21
async function post<T>(path: string, body: object): Promise<T> {
_21
const res = await fetch(`${BACKEND}${path}`, {
_21
method: 'POST',
_21
headers: { 'Content-Type': 'application/json' },
_21
body: JSON.stringify(body),
_21
});
_21
if (!res.ok) throw new Error(`${path} failed: ${res.status} ${await res.text()}`);
_21
return res.json() as Promise<T>;
_21
}
_21
_21
export type TokenResponse = { rtcToken: string; rtmToken: string; expireAt: number };
_21
export type InviteResponse = { agentId: string; agentUid: number };
_21
_21
export const api = {
_21
token: (channel: string, uid: number) => post<TokenResponse>('/api/token', { channel, uid }),
_21
invite: (channel: string) => post<InviteResponse>('/api/invite-agent', { channel }),
_21
stop: (agentId: string) => post<{ stopped: boolean }>('/api/stop-conversation', { agentId }),
_21
};

For TypeScript, NEXT_PUBLIC_BACKEND_URL is not set in .env.local, so calls go to same-origin routes like /api/token. For Python and Go, it is set to http://localhost:8000 in the scaffold step, so calls go to the external backend.

Create the RTC and RTM hook

Create hooks/useConvoAgent.ts. This hook joins the RTC channel for audio, connects to RTM for transcripts, and exposes a start() and stop() function to the UI.


_85
// hooks/useConvoAgent.ts
_85
'use client';
_85
import { useCallback, useRef, useState } from 'react';
_85
import AgoraRTC, { IAgoraRTCClient, IMicrophoneAudioTrack } from 'agora-rtc-sdk-ng';
_85
import { RTMClient, RTMEvents } from 'agora-rtm-sdk';
_85
import { api } from '@/lib/api';
_85
_85
type TranscriptLine = { role: 'user' | 'agent'; text: string; final: boolean };
_85
_85
export function useConvoAgent(channel: string, uid: number) {
_85
const rtcRef = useRef<IAgoraRTCClient | null>(null);
_85
const micRef = useRef<IMicrophoneAudioTrack | null>(null);
_85
const rtmRef = useRef<RTMClient | null>(null);
_85
const agentIdRef = useRef<string | null>(null);
_85
_85
const [connected, setConnected] = useState(false);
_85
const [transcripts, setTranscripts] = useState<TranscriptLine[]>([]);
_85
const [error, setError] = useState<string | null>(null);
_85
_85
const start = useCallback(async () => {
_85
try {
_85
const { rtcToken, rtmToken } = await api.token(channel, uid);
_85
const appId = process.env.NEXT_PUBLIC_AGORA_APP_ID!;
_85
_85
// 1. Join RTC and publish the mic
_85
const rtc = AgoraRTC.createClient({ mode: 'rtc', codec: 'vp8' });
_85
await rtc.join(appId, channel, rtcToken, uid);
_85
const mic = await AgoraRTC.createMicrophoneAudioTrack();
_85
await rtc.publish(mic);
_85
rtc.on('user-published', async (user, mediaType) => {
_85
if (mediaType === 'audio') {
_85
await rtc.subscribe(user, mediaType);
_85
user.audioTrack?.play();
_85
}
_85
});
_85
rtcRef.current = rtc;
_85
micRef.current = mic;
_85
_85
// 2. Join RTM for transcripts
_85
const rtm = new RTMClient({ appId, userId: String(uid) });
_85
await rtm.login({ token: rtmToken });
_85
await rtm.subscribe(channel);
_85
rtm.addEventListener('message', (e: RTMEvents.MessageEvent) => {
_85
try {
_85
const payload = JSON.parse(e.message as string);
_85
if (payload.type === 'transcript') {
_85
setTranscripts(prev => [
_85
...prev,
_85
{ role: payload.role, text: payload.text, final: payload.final },
_85
]);
_85
}
_85
} catch {
_85
// Non-JSON RTM messages. Ignore for this tutorial.
_85
}
_85
});
_85
rtmRef.current = rtm;
_85
_85
// 3. Ask the backend to bring the agent into the channel
_85
const { agentId } = await api.invite(channel);
_85
agentIdRef.current = agentId;
_85
_85
setConnected(true);
_85
} catch (e) {
_85
setError(e instanceof Error ? e.message : String(e));
_85
}
_85
}, [channel, uid]);
_85
_85
const stop = useCallback(async () => {
_85
try {
_85
if (agentIdRef.current) await api.stop(agentIdRef.current);
_85
} finally {
_85
micRef.current?.stop();
_85
micRef.current?.close();
_85
await rtcRef.current?.leave();
_85
await rtmRef.current?.logout();
_85
agentIdRef.current = null;
_85
rtcRef.current = null;
_85
micRef.current = null;
_85
rtmRef.current = null;
_85
setConnected(false);
_85
}
_85
}, []);
_85
_85
return { connected, transcripts, error, start, stop };
_85
}

The hook follows the same structure as components/ConversationComponent.tsx in the Next.js starter repo.

Build the client UI

Create app/page.tsx as the main UI. It renders a start/stop button and a live transcript list.


_40
// app/page.tsx
_40
'use client';
_40
import { useConvoAgent } from '@/hooks/useConvoAgent';
_40
_40
const CHANNEL = 'support-room-123';
_40
const USER_UID = 111222;
_40
_40
export default function Home() {
_40
const { connected, transcripts, error, start, stop } = useConvoAgent(CHANNEL, USER_UID);
_40
_40
return (
_40
<main className="mx-auto max-w-2xl p-8 space-y-6">
_40
<h1 className="text-2xl font-semibold">Conv AI Tutorial</h1>
_40
_40
<div className="space-x-3">
_40
{!connected ? (
_40
<button onClick={start} className="rounded bg-blue-600 text-white px-4 py-2">
_40
Start conversation
_40
</button>
_40
) : (
_40
<button onClick={stop} className="rounded bg-red-600 text-white px-4 py-2">
_40
Stop
_40
</button>
_40
)}
_40
</div>
_40
_40
{error && <p className="text-red-600">Error: {error}</p>}
_40
_40
<ol className="space-y-2">
_40
{transcripts.map((line, i) => (
_40
<li key={i} className={line.role === 'agent' ? 'text-blue-800' : 'text-slate-800'}>
_40
<span className="font-medium">{line.role === 'agent' ? 'Agent' : 'You'}:</span>{' '}
_40
{line.text}
_40
{!line.final && <span className="text-slate-400"> …</span>}
_40
</li>
_40
))}
_40
</ol>
_40
</main>
_40
);
_40
}

The page has no state library or design system. It provides just enough UI to verify that the backend is working.

Handle page unload (optional)

Add this effect inside app/page.tsx to stop the agent cleanly when the user closes the tab, rather than waiting for the 30-second idle timeout.


_6
// Inside the page, after useConvoAgent(...)
_6
useEffect(() => {
_6
const onUnload = () => { if (connected) stop(); };
_6
window.addEventListener('beforeunload', onUnload);
_6
return () => window.removeEventListener('beforeunload', onUnload);
_6
}, [connected, stop]);

Test and validate

Start the app and verify that the agent joins, responds, and stops cleanly.

Run the app


_1
pnpm dev

Open http://localhost:3000, click Start conversation, allow microphone access, and speak.

Verify the integration

A healthy run passes all three checks:

CheckHow to verifyTime budget
Agent joined the channelThe invite-agent response resolves with an agentId, and the agent emits a greeting in RTC within two seconds.< 2 s
Transcripts streamtranscripts state updates as you speak; partial lines are marked final: false.< 500 ms partial latency
Stop is cleanAfter Stop, the backend returns { stopped: true }, and the Convo AI engine logs STATE=STOPPED, reason=API.Immediate

If you run into problems, first run the CLI diagnostic:


_1
agora project doctor

This checks for credential errors, feature-enablement issues, and network reachability problems.

Troubleshooting

SymptomLikely causeFix
Error: /api/token failed: 500 in the browserBackend cannot read the APP_CERTIFICATE environment variable.Confirm .env.local (TypeScript) or server-python/.env (Python) contains the variable and that the server loaded the file on startup.
invalid token from the RTC joinClock skew between token generation and channel join.RTC tokens are time-sensitive. Regenerate a token on each start() call to avoid expiry issues.
Agent never speaks but agentId is returnedConversational AI feature not enabled on the Agora project.Run agora project feature list. If convoai is missing, rerun agora project create --feature convoai or enable it in the Agora Console.
No transcripts in RTMenable_rtm not set, or data_channel is set to stream instead of rtm.Confirm advancedFeatures.enable_rtm: true and parameters.data_channel: 'rtm' in the agent config.
CORS error in the browser (Python)FastAPI CORS middleware does not include your frontend origin.Add http://localhost:3000 to allow_origins in main.py.
Agent greets itself in a loopNo echo cancellation on the device.Use headphones, or set parameters.enable_aec: true.
unauthorized error on agora login in CISSO browser flow cannot open on a headless machine.Use agora login --device for the device-code flow.
Chrome blocks microphone accessgetUserMedia is not available on non-localhost HTTP origins.Test on http://localhost:3000 exactly, not http://127.0.0.1 or a LAN IP.

Next steps

Now that you have a working agent, explore the following topics: