Typescript SDK API reference

Full API reference for the Agora Conversational AI TypeScript SDK.

AgoraClient

AgoraClient extends the Fern-generated base client with regional domain pool support and three authentication modes.

import { AgoraClient, Area } from 'agora-agent-sdk';

Constructor

new AgoraClient(options: AgoraClient.Options)

The authentication mode is resolved automatically from the options you provide.

Option	Type	Required	Description
`area`	`Area`	Yes	Region for API routing
`appId`	`string`	Yes	Agora App ID
`appCertificate`	`string`	Yes	Agora App Certificate. Keep this secret and never expose it client-side
`customerId`	`string`	No	Customer ID for Basic Auth
`customerSecret`	`string`	No	Customer Secret for Basic Auth
`authToken`	`string`	No	Pre-built `agora token=<value>` string
`timeout`	`number`	No	Request timeout in milliseconds
`maxRetries`	`number`	No	Maximum retry attempts
`fetch`	`typeof fetch`	No	Custom fetch implementation for unsupported runtimes

Authentication mode is resolved from the options you provide:

Options provided	Resolved `authMode`
`customerId` + `customerSecret`	`"basic"`
`authToken`	`"token"`
Neither	`"app-credentials"`

See Authentication for details on each mode.

Properties

The following read-only properties are available on any AgoraClient instance.

Property	Type	Description
`appId`	`string`	The Agora App ID
`appCertificate`	`string`	The Agora App Certificate
`authMode`	`AgoraAuthMode`	The resolved authentication mode
`pool`	`Pool`	The underlying domain pool instance used for regional routing

Methods

The following methods are available in addition to the Fern-generated sub-client methods.

`nextRegion()`

Cycles to the next region prefix in the domain pool. Call this after a request failure to try a different regional endpoint.

client.nextRegion();

`selectBestDomain(signal?)`

Triggers a manual DNS resolution check to select the best domain suffix. Runs automatically every 30 seconds, but can be called manually after a network change.

await client.selectBestDomain();

Parameter	Type	Description
`signal`	`AbortSignal`	Optional abort signal to cancel the DNS check

`getCurrentURL()`

Returns the full API URL currently in use as a string.

const url = client.getCurrentURL();
// Example: 'https://api-us-west-1.agora.io/api/conversational-ai-agent'

Sub-clients

AgoraClient exposes Fern-generated sub-clients for direct REST API access. You typically do not need these when using the agentkit layer.

Property	Description
`client.agents`	Start, stop, update, speak, interrupt, get history, list agents
`client.telephony`	Telephony operations
`client.phoneNumbers`	Phone number management

Sub-clients are lazily initialized on first access. For most use cases, prefer the AgentSession API over calling client.agents directly.

For full method signatures and request parameters, see the REST API reference.

Agent is an immutable configuration object. Each builder method returns a new Agent instance — the original is never modified. Define one Agent at startup and call createSession() on it for each user conversation.

import { Agent } from 'agora-agent-sdk';

Constructor

new Agent(options?: AgentOptions)

All options are optional. Use the builder methods to set vendor configuration after construction.

Option	Type	Default	Description
`name`	`string`	`undefined`	Agent name, used as the default session name
`instructions`	`string`	`undefined`	LLM system prompt
`greeting`	`string`	`undefined`	First message spoken when the session starts
`failureMessage`	`string`	`undefined`	Message spoken when an LLM call fails
`maxHistory`	`number`	`undefined`	Maximum conversation turns kept in LLM context
`turnDetection`	`TurnDetectionConfig`	`undefined`	Voice activity detection settings
`interruption`	`InterruptionConfig`	`undefined`	Unified interruption control settings
`sal`	`SalConfig`	`undefined`	Selective Attention Locking configuration
`avatar`	`AvatarConfig`	`undefined`	Avatar configuration
`advancedFeatures`	`AdvancedFeatures`	`undefined`	Enable MLLM mode, AI-VAD, and other advanced features
`parameters`	`SessionParams`	`undefined`	Session parameters including silence and farewell config
`geofence`	`GeofenceConfig`	`undefined`	Regional access restriction
`labels`	`Labels`	`undefined`	Custom key-value labels returned in notification callbacks
`rtc`	`RtcConfig`	`undefined`	RTC media encryption
`fillerWords`	`FillerWordsConfig`	`undefined`	Filler words played while waiting for the LLM response

Builder methods

All builder methods return a new Agent instance. The original is never modified.

`withLlm(vendor)`

Sets the LLM vendor. Pass an instance of OpenAI, AzureOpenAI, Anthropic, or Gemini.

withLlm(vendor: BaseLLM): Agent<TTSSampleRate>

`withTts(vendor)`

Sets the TTS vendor. The sample rate type is captured and tracked for avatar compatibility.

withTts<SR extends number>(vendor: BaseTTS<SR>): Agent<SR>

`withStt(vendor)`

Sets the STT vendor. Pass an instance of any STT vendor class.

withStt(vendor: BaseSTT): Agent<TTSSampleRate>

`withMllm(vendor)`

Sets the MLLM vendor for multimodal mode. Pass OpenAIRealtime, GeminiLive, or VertexAI. Requires advancedFeatures: { enable_mllm: true } in the constructor. Calling withMllm() automatically sets mllm.enable = true.

withMllm(vendor: BaseMLLM): Agent<TTSSampleRate>

`withAvatar(vendor)`

Sets the avatar vendor. The this constraint enforces at compile time that the agent's TTS sample rate matches the avatar's required rate.

withAvatar<RequiredSR extends number>(
  this: Agent<RequiredSR>,
  vendor: BaseAvatar<RequiredSR>
): Agent<RequiredSR>

`withTurnDetection(config)`

Configures cascading-flow turn detection. Use config.start_of_speech and config.end_of_speech for SOS/EOS detection. Use withInterruption() for interruption behavior and MLLM vendor turnDetection for MLLM turn detection.

withTurnDetection(config: TurnDetectionConfig): Agent<TTSSampleRate>

`withInterruption(config)`

Configures unified interruption behavior using the top-level interruption object. Use this for start_of_speech and keywords interruption modes.

withInterruption(config: InterruptionConfig): Agent<TTSSampleRate>

`withInstructions(text)`

Overrides the LLM system prompt on a new Agent instance.

withInstructions(instructions: string): Agent<TTSSampleRate>

`withGreeting(text)`

Overrides the greeting message on a new Agent instance.

withGreeting(greeting: string): Agent<TTSSampleRate>

`withName(name)`

Overrides the agent name on a new Agent instance.

withName(name: string): Agent<TTSSampleRate>

Other builder methods

The following methods follow the same pattern — each returns a new Agent instance with the updated configuration.

Method	Parameter type	Description
`withSal(config)`	`SalConfig`	Set Selective Attention Locking configuration
`withAdvancedFeatures(features)`	`AdvancedFeatures`	Set advanced features
`withTools(enabled)`	`boolean`	Enable or disable MCP tool invocation
`withParameters(parameters)`	`SessionParams`	Set session parameters
`withFailureMessage(message)`	`string`	Set the message spoken when the LLM fails
`withMaxHistory(n)`	`number`	Set the maximum conversation history length
`withGeofence(geofence)`	`GeofenceConfig`	Set geofence configuration
`withLabels(labels)`	`Labels`	Set custom labels
`withRtc(rtc)`	`RtcConfig`	Set RTC configuration
`withFillerWords(fillerWords)`	`FillerWordsConfig`	Set filler words configuration

`createSession(client, options)`

Creates an AgentSession bound to a specific client and channel. Does not start the agent — call session.start() to join the channel.

createSession(
  client: AgoraClient,
  options: SessionOptions,
): AgentSession

SessionOptions fields:

Option	Type	Required	Description
`channel`	`string`	Yes	Channel name to join
`agentUid`	`string`	Yes	The agent's RTC UID
`remoteUids`	`string[]`	Yes	Remote user UIDs the agent listens and responds to
`name`	`string`	No	Session name. Defaults to agent name or `agent-{timestamp}`
`token`	`string`	No	Pre-built RTC+RTM token. Omit to auto-generate from app credentials
`expiresIn`	`number`	No	Token lifetime in seconds. Only applies when the token is auto-generated. Valid range: 1–86400. Use `ExpiresIn` helpers for clarity
`idleTimeout`	`number`	No	Seconds before the agent auto-exits when no audio is detected. `0` disables the timeout
`enableStringUid`	`boolean`	No	Use string UIDs instead of numeric UIDs
`preset`	`string \| AgentPreset[]`	No	Session-level preset IDs to use as the base ASR/LLM/TTS configuration. Accepts a comma-separated string or an array of `AgentPresets.*` values
`pipelineId`	`string`	No	Published AI Studio pipeline ID to use as the base configuration
`debug`	`boolean`	No	Log API requests to the console

preset is session-scoped because the underlying Agora start/join API applies presets per session, not per reusable Agent definition.

When you omit credentials for supported reseller-backed vendor models, AgentKit infers the matching session preset automatically:

Deepgram STT: nova-2, nova-3
OpenAI LLM: gpt-4o-mini, gpt-4.1-mini, gpt-5-nano, gpt-5-mini
OpenAI TTS: tts-1
MiniMax TTS: speech-2.6-turbo, speech-2.8-turbo

If you provide your own vendor API key for those same models, AgentKit keeps the request in BYOK mode and does not infer a preset.

Properties

Read-only properties available on any Agent instance.

Property	Type	Description
`name`	`string \| undefined`	Agent name
`instructions`	`string \| undefined`	LLM system prompt
`greeting`	`string \| undefined`	Greeting message
`failureMessage`	`string \| undefined`	Message spoken when LLM fails
`maxHistory`	`number \| undefined`	Maximum conversation history length
`llm`	`LlmConfig \| undefined`	LLM configuration
`tts`	`TtsConfig \| undefined`	TTS configuration
`stt`	`SttConfig \| undefined`	STT configuration
`mllm`	`MllmConfig \| undefined`	MLLM configuration
`avatar`	`AvatarConfig \| undefined`	Avatar configuration
`turnDetection`	`TurnDetectionConfig \| undefined`	Turn detection configuration
`interruption`	`InterruptionConfig \| undefined`	Interruption configuration
`sal`	`SalConfig \| undefined`	SAL configuration
`advancedFeatures`	`AdvancedFeatures \| undefined`	Advanced features
`parameters`	`SessionParams \| undefined`	Session parameters
`geofence`	`GeofenceConfig \| undefined`	Geofence configuration
`labels`	`Labels \| undefined`	Custom labels
`rtc`	`RtcConfig \| undefined`	RTC configuration
`fillerWords`	`FillerWordsConfig \| undefined`	Filler words configuration
`config`	`AgentOptions`	Full read-only configuration snapshot

`toProperties()`

Low-level method to convert the agent configuration to the Fern request format. Used internally by AgentSession.start(). You typically do not need to call this directly unless building custom request bodies.

AgentSession

AgentSession manages the full lifecycle of a running agent. Obtain an AgentSession by calling agent.createSession() — do not call the constructor directly.

import { AgentSession } from 'agora-agent-sdk';

State machine

A session progresses through the following states:

idle ──► starting ──► running ──► stopping ──► stopped
                         │
                         ▼
                       error

Transition	Trigger
`idle → starting`	`start()` called
`starting → running`	API responds with agent ID
`starting → error`	API request fails
`running → stopping`	`stop()` called
`stopping → stopped`	API confirms agent stopped
`stopping → error`	Stop request fails and agent was not already stopped
`running → error`	Unrecoverable error during interaction

start() can also be called from stopped or error state to restart the session.

Methods

The following methods are available on an AgentSession instance.

`start()`

Starts the agent session. Generates tokens if not provided, sends the start request, and returns the agent ID. Resolves explicit preset values and also infers reseller presets from supported vendor configs when credentials are omitted.

start(): Promise<string>

Transitions: idle / stopped / error → starting → running
Throws if called in starting, running, or stopping state
Throws if avatar configuration has a TTS sample rate mismatch

`stop()`

Stops the agent session and removes the agent from the channel. If the agent has already stopped — for example due to idle timeout — resolves silently rather than throwing a 404 error.

stop(): Promise<void>

Transitions: running → stopping → stopped
Throws if called outside running state

`say(text, options?)`

Instructs the agent to speak the given text.

say(text: string, options?: SayOptions): Promise<void>

Parameter	Type	Required	Description
`text`	`string`	Yes	The text for the agent to speak
`options.priority`	`SpeakPriority`	No	Message priority
`options.interruptable`	`boolean`	No	Whether this message can be interrupted by the user

Only valid in running state

`interrupt()`

Interrupts the agent's current speech.

interrupt(): Promise<void>

Only valid in running state

`update(config)`

Updates the agent configuration mid-session without restarting. Accepts a partial configuration object in REST API format.

update(config: AgentConfigUpdate): Promise<void>

Only valid in running state

`getHistory()`

Fetches the conversation history for this session. Requires a valid agent ID — start() must have been called successfully.

getHistory(): Promise<ConversationHistory>

`getTurns()`

Fetches turn-by-turn analytics for this session, including start/end events and latency metrics. Requires a valid agent ID — start() must have been called successfully.

getTurns(): Promise<ConversationTurns>

`getInfo()`

Fetches current agent metadata from the API. Requires a valid agent ID.

getInfo(): Promise<SessionInfo>

`on(event, handler)`

Subscribes to a session event. Register handlers before calling start() to avoid missing the started event.

on<T>(event: AgentSessionEvent, handler: AgentSessionEventHandler<T>): void

`off(event, handler)`

Unsubscribes a previously registered event handler.

off<T>(event: AgentSessionEvent, handler: AgentSessionEventHandler<T>): void

Events

The session emits the following events. See AgentSessionEvent and AgentSessionEventHandler for type details.

Event	Payload type	Description
`"started"`	`{ agentId: string }`	Agent successfully joined the channel
`"stopped"`	`{ agentId: string }`	Agent left the channel
`"error"`	`Error`	An unrecoverable error occurred

Properties

The following read-only properties are available on any AgentSession instance.

Property	Type	Description
`status`	`string`	Current session state. One of `"idle"`, `"starting"`, `"running"`, `"stopping"`, `"stopped"`, `"error"`
`id`	`string \| null`	Agent ID, populated after `start()` resolves
`agent`	`Agent`	The agent configuration this session was created from
`appId`	`string`	The Agora App ID for this session
`raw`	`AgentsClient`	Direct access to the Fern-generated `AgentsClient` for advanced operations

Using `session.raw`

Use session.raw to call REST API endpoints not yet exposed by the agentkit layer. You must pass appid and agentId manually.

await session.raw.someNewEndpoint({
  appid: session.appId,
  agentId: session.id!,
});

Presets and BYOK

preset lives on the session because Agora applies presets when the agent joins a channel.

AgentKit supports both explicit presets and BYOK:

Pass preset directly on agent.createSession(...) when you want to choose the base reseller configuration yourself.
Provide vendor credentials for preset-capable models when you want full BYOK behavior.
Omit credentials for supported reseller models when you want AgentKit to infer the matching preset automatically.

Supported inferred preset models:

Deepgram STT: nova-2, nova-3
OpenAI LLM: gpt-4o-mini, gpt-4.1-mini, gpt-5-nano, gpt-5-mini
OpenAI TTS: tts-1
MiniMax TTS: speech-2.6-turbo, speech-2.8-turbo

Vendors

All vendor classes are imported from agora-agent-sdk. Pass vendor instances to the Agent builder methods.

LLM vendors

Use with withLlm().

OpenAI

new OpenAI(options: OpenAIOptions)

Option	Type	Required	Description
`apiKey`	`string`	Usually	OpenAI API key
`model`	`string`	Yes	Model name, for example `'gpt-4o-mini'`
`url`	`string`	No	API endpoint URL. Default: `https://api.openai.com/v1/chat/completions`
`maxHistory`	`number`	No	Maximum conversation history to cache
`systemMessages`	`Record<string, unknown>[]`	No	Additional system messages
`greetingMessage`	`string`	No	Agent greeting message
`failureMessage`	`string`	No	Message spoken when the LLM call fails
`inputModalities`	`string[]`	No	Input modalities. Default: `["text"]`
`outputModalities`	`string[]`	No	Output modalities
`params`	`Record<string, unknown>`	No	Additional LLM parameters passed to the model
`headers`	`Record<string, string>`	No	Custom HTTP headers forwarded to the LLM provider
`greetingConfigs`	`LlmGreetingConfigs`	No	Greeting playback configuration
`templateVariables`	`Record<string, string>`	No	Template variables for messages

apiKey is optional for the following reseller preset models: gpt-4o-mini, gpt-4.1-mini, gpt-5-nano, gpt-5-mini. If apiKey is omitted for one of those models, AgentKit infers the matching session preset. This no-key branch is only available with the default OpenAI endpoint and without a custom vendor hint. If apiKey is provided, AgentKit uses standard BYOK behavior instead.

AzureOpenAI

new AzureOpenAI(options: AzureOpenAIOptions)

Option	Type	Required	Description
`apiKey`	`string`	Yes	Azure OpenAI API key
`model`	`string`	Yes	Model or deployment name
`resourceName`	`string`	Yes	Azure resource name
`deploymentName`	`string`	Yes	Deployment name in Azure
`apiVersion`	`string`	No	Azure API version. Default: `'2023-05-15'`
`maxHistory`	`number`	No	Maximum conversation history to cache
`systemMessages`	`Record<string, unknown>[]`	No	Additional system messages
`greetingMessage`	`string`	No	Agent greeting message
`failureMessage`	`string`	No	Message spoken when the LLM call fails
`inputModalities`	`string[]`	No	Input modalities. Default: `["text"]`
`outputModalities`	`string[]`	No	Output modalities
`params`	`Record<string, unknown>`	No	Additional LLM parameters
`headers`	`Record<string, string>`	No	Custom HTTP headers forwarded to the LLM provider
`greetingConfigs`	`LlmGreetingConfigs`	No	Greeting playback configuration
`templateVariables`	`Record<string, string>`	No	Template variables for messages

Anthropic

new Anthropic(options: AnthropicOptions)

Option	Type	Required	Description
`apiKey`	`string`	Yes	Anthropic API key
`model`	`string`	Yes	Model name, for example `'claude-3-5-sonnet-20241022'`
`url`	`string`	No	API endpoint URL. Default: `https://api.anthropic.com/v1/messages`
`maxHistory`	`number`	No	Maximum conversation history to cache
`systemMessages`	`Record<string, unknown>[]`	No	Additional system messages
`greetingMessage`	`string`	No	Agent greeting message
`failureMessage`	`string`	No	Message spoken when the LLM call fails
`inputModalities`	`string[]`	No	Input modalities. Default: `["text"]`
`outputModalities`	`string[]`	No	Output modalities
`params`	`Record<string, unknown>`	No	Additional LLM parameters
`headers`	`Record<string, string>`	No	Custom HTTP headers forwarded to the LLM provider
`greetingConfigs`	`LlmGreetingConfigs`	No	Greeting playback configuration
`templateVariables`	`Record<string, string>`	No	Template variables for messages

Gemini

new Gemini(options: GeminiOptions)

Option	Type	Required	Description
`apiKey`	`string`	Yes	Google API key
`model`	`string`	Yes	Model name, for example `'gemini-pro'`
`url`	`string`	No	API endpoint URL. Default: `https://generativelanguage.googleapis.com/v1beta/models`
`maxHistory`	`number`	No	Maximum conversation history to cache
`systemMessages`	`Record<string, unknown>[]`	No	Additional system messages
`greetingMessage`	`string`	No	Agent greeting message
`failureMessage`	`string`	No	Message spoken when the LLM call fails
`inputModalities`	`string[]`	No	Input modalities. Default: `["text"]`
`outputModalities`	`string[]`	No	Output modalities
`params`	`Record<string, unknown>`	No	Additional LLM parameters
`headers`	`Record<string, string>`	No	Custom HTTP headers forwarded to the LLM provider
`greetingConfigs`	`LlmGreetingConfigs`	No	Greeting playback configuration
`templateVariables`	`Record<string, string>`	No	Template variables for messages

TTS vendors

Use with withTts(). The sampleRate option determines avatar compatibility — see withAvatar().

ElevenLabsTTS

new ElevenLabsTTS<SR extends ElevenLabsSampleRate>(options: ElevenLabsTTSOptions<SR>)

Option	Type	Required	Description
`key`	`string`	Yes	ElevenLabs API key
`modelId`	`string`	Yes	Model ID, for example `'eleven_flash_v2_5'`
`voiceId`	`string`	Yes	Voice ID
`sampleRate`	`16000 \| 22050 \| 24000 \| 44100`	No	Audio sample rate in Hz
`baseUrl`	`string`	No	WebSocket base URL
`skipPatterns`	`number[]`	No	Skip patterns for bracketed content

MicrosoftTTS

new MicrosoftTTS<SR extends MicrosoftSampleRate>(options: MicrosoftTTSOptions<SR>)

Option	Type	Required	Description
`key`	`string`	Yes	Azure Speech API key
`region`	`string`	Yes	Azure region, for example `'eastus'`
`voiceName`	`string`	Yes	Voice name, for example `'en-US-JennyNeural'`
`sampleRate`	`16000 \| 24000 \| 48000`	No	Audio sample rate in Hz
`skipPatterns`	`number[]`	No	Skip patterns for bracketed content

OpenAITTS

Fixed at 24,000 Hz — no configurable sample rate.

new OpenAITTS(options: OpenAITTSOptions)

Option	Type	Required	Description
`apiKey`	`string`	Usually	OpenAI API key
`voice`	`string`	Yes	Voice name: `'alloy'`, `'echo'`, `'fable'`, `'onyx'`, `'nova'`, or `'shimmer'`
`model`	`string`	No	Model name, for example `'tts-1'` or `'tts-1-hd'`
`responseFormat`	`string`	No	Audio format, for example `'pcm'`
`speed`	`number`	No	Speech speed multiplier
`skipPatterns`	`number[]`	No	Skip patterns for bracketed content

apiKey is optional only for the reseller-backed tts-1 preset path. If omitted with model: 'tts-1' or no explicit model, AgentKit infers openai_tts_1. If provided, the request stays in BYOK mode.

CartesiaTTS

new CartesiaTTS<SR extends CartesiaSampleRate>(options: CartesiaTTSOptions<SR>)

Option	Type	Required	Description
`apiKey`	`string`	Yes	Cartesia API key
`voiceId`	`string`	Yes	Voice ID (serialized as `{"mode": "id", "id": "..."}`)
`modelId`	`string`	No	Model ID
`sampleRate`	`8000 \| 16000 \| 22050 \| 24000 \| 44100 \| 48000`	No	Audio sample rate in Hz
`skipPatterns`	`number[]`	No	Skip patterns for bracketed content

Other TTS vendors

Class	Key parameters
`GoogleTTS`	`key`, `voiceName`, `languageCode?`
`AmazonTTS`	`accessKey`, `secretKey`, `region`, `voiceId`
`DeepgramTTS`	`apiKey`, `model`, `baseUrl?`, `sampleRate?`, `params?`
`HumeAITTS`	`key`, `configId?`
`RimeTTS`	`key`, `speaker`, `modelId?`, `lang?`, `samplingRate?`, `speedAlpha?`
`FishAudioTTS`	`key`, `referenceId`
`MiniMaxTTS`	`key?`, `groupId?`, `model`, `voiceId?`, `url?`
`MurfTTS`	`key`, `voiceId`, `style?`
`SarvamTTS`	`key`, `speaker`, `targetLanguageCode`

For MiniMaxTTS, key is optional only for reseller-backed models: speech-2.6-turbo, speech-2.8-turbo. If key is omitted for one of those models, AgentKit infers the matching session preset. In that preset-backed path, groupId, voiceId, and url are optional overrides rather than required fields. If key is provided, AgentKit uses BYOK.

STT vendors

Use with withStt().

DeepgramSTT

new DeepgramSTT(options?: DeepgramSTTOptions)

All options are optional.

Option	Type	Description
`apiKey`	`string`	Deepgram API key. Optional for `nova-2` and `nova-3` reseller preset usage
`model`	`string`	Model name, for example `'nova-2'` or `'enhanced'`
`language`	`string`	Language code, for example `'en-US'`
`smartFormat`	`boolean`	Enable smart formatting
`punctuation`	`boolean`	Enable punctuation
`additionalParams`	`Record<string, unknown>`	Additional vendor parameters

If apiKey is omitted for nova-2 or nova-3, AgentKit infers the matching Deepgram reseller preset. For all other Deepgram models, TypeScript requires apiKey.

Other STT vendors

Class	Key parameters
`SpeechmaticsSTT`	`apiKey`, `language`
`MicrosoftSTT`	`key`, `region`, `language?`
`OpenAISTT`	`apiKey`, `model?`, `language?`
`GoogleSTT`	`apiKey`, `language?`
`AmazonSTT`	`accessKey`, `secretKey`, `region`, `language?`
`AssemblyAISTT`	`apiKey`, `language?`
`AresSTT`	`language?`
`SarvamSTT`	`apiKey`, `language`

MLLM vendors

Use with withMllm() for multimodal end-to-end audio processing without separate STT or TTS steps. Requires advancedFeatures: { enable_mllm: true } in the Agent constructor.

OpenAIRealtime

new OpenAIRealtime(options: OpenAIRealtimeOptions)

Option	Type	Required	Description
`apiKey`	`string`	Yes	OpenAI API key
`model`	`string`	No	Model name, for example `'gpt-4o-realtime-preview'`
`url`	`string`	No	WebSocket URL
`greetingMessage`	`string`	No	Agent greeting message
`failureMessage`	`string`	No	Message played when the model call fails
`maxHistory`	`number`	No	Maximum conversation history length
`predefinedTools`	`string[]`	No	Predefined tools, for example `['_publish_message']`
`inputModalities`	`string[]`	No	Input modalities, for example `['audio']`
`outputModalities`	`string[]`	No	Output modalities, for example `['text', 'audio']`
`messages`	`Record<string, unknown>[]`	No	Conversation messages for short-term memory
`params`	`Record<string, unknown>`	No	Additional MLLM parameters
`turnDetection`	`MllmTurnDetectionConfig`	No	MLLM turn detection configuration; overrides top-level `turnDetection`

GeminiLive

new GeminiLive(options: GeminiLiveOptions)

Option	Type	Required	Description
`apiKey`	`string`	Yes	Google API key
`model`	`string`	Yes	Model name, for example `'gemini-live-2.5-flash'`
`url`	`string`	No	WebSocket URL
`instructions`	`string`	No	System instructions for the model
`voice`	`string`	No	Voice name, for example `'Aoede'` or `'Charon'`
`greetingMessage`	`string`	No	Agent greeting message
`failureMessage`	`string`	No	Message played when the model call fails
`maxHistory`	`number`	No	Maximum conversation history length
`predefinedTools`	`string[]`	No	Predefined tools, for example `['_publish_message']`
`inputModalities`	`string[]`	No	Input modalities
`outputModalities`	`string[]`	No	Output modalities
`messages`	`Record<string, unknown>[]`	No	Conversation messages for short-term memory
`additionalParams`	`Record<string, unknown>`	No	Additional parameters
`turnDetection`	`MllmTurnDetectionConfig`	No	MLLM turn detection configuration; overrides top-level `turnDetection`

VertexAI

new VertexAI(options: VertexAIOptions)

Option	Type	Required	Description
`model`	`string`	Yes	Model name, for example `'gemini-live-2.5-flash-preview-native-audio-09-2025'`
`projectId`	`string`	Yes	Google Cloud project ID
`location`	`string`	Yes	Google Cloud location or region
`adcCredentialsString`	`string`	Yes	Application Default Credentials JSON string
`url`	`string`	No	WebSocket URL
`instructions`	`string`	No	System instructions for the model
`voice`	`string`	No	Voice name, for example `'Aoede'` or `'Charon'`
`greetingMessage`	`string`	No	Agent greeting message
`failureMessage`	`string`	No	Message played when the model call fails
`maxHistory`	`number`	No	Maximum conversation history length
`predefinedTools`	`string[]`	No	Predefined tools, for example `['_publish_message']`
`inputModalities`	`string[]`	No	Input modalities
`outputModalities`	`string[]`	No	Output modalities
`messages`	`Record<string, unknown>[]`	No	Conversation messages for short-term memory
`additionalParams`	`Record<string, unknown>`	No	Additional parameters
`turnDetection`	`MllmTurnDetectionConfig`	No	MLLM turn detection configuration; overrides top-level `turnDetection`

Avatar vendors

Use with withAvatar(). Each avatar vendor requires a specific TTS sample rate enforced at compile time and runtime.

HeyGenAvatar

Requires TTS at 24,000 Hz.

new HeyGenAvatar(options: HeyGenAvatarOptions)

Option	Type	Required	Description
`apiKey`	`string`	Yes	HeyGen API key
`quality`	`'low' \| 'medium' \| 'high'`	Yes	Video quality: 360p, 480p, or 720p
`agoraUid`	`string`	Yes	RTC UID for the avatar stream
`agoraToken`	`string`	No	RTC token for avatar authentication
`avatarId`	`string`	No	HeyGen avatar ID
`disableIdleTimeout`	`boolean`	No	Disable idle timeout. Default: `false`
`activityIdleTimeout`	`number`	No	Idle timeout in seconds. Default: `120`
`enable`	`boolean`	No	Enable or disable the avatar. Default: `true`

AkoolAvatar

Requires TTS at 16,000 Hz.

new AkoolAvatar(options: AkoolAvatarOptions)

Option	Type	Required	Description
`apiKey`	`string`	Yes	Akool API key
`avatarId`	`string`	No	Akool avatar ID
`enable`	`boolean`	No	Enable or disable the avatar. Default: `true`

Token utilities

Helper functions and classes for generating and managing tokens. Use these when you need control over token lifetime, or when generating tokens outside of a session.

import { generateConvoAIToken, ExpiresIn } from 'agora-agent-sdk';

`generateConvoAIToken(options)`

Generates a Conversational AI token combining RTC and RTM privileges. This is the same token the SDK generates automatically in app-credentials mode. Use this when you need a token outside of a session, or when passing a pre-built token to SessionOptions.token.

generateConvoAIToken(options: GenerateConvoAITokenOptions): string

Option	Type	Required	Description
`appId`	`string`	Yes	Agora App ID
`appCertificate`	`string`	Yes	Agora App Certificate
`channelName`	`string`	Yes	The channel the token grants access to
`account`	`string`	Yes	The UID this token is issued for, as a string
`tokenExpire`	`number`	No	Token lifetime in seconds. Default: `86400`. Valid range: 1–86400

Returns: string — the generated token.

const token = generateConvoAIToken({
  appId: 'your-app-id',
  appCertificate: 'your-app-certificate',
  channelName: 'support-room-123',
  account: '1',
  tokenExpire: ExpiresIn.hours(12),
});

`ExpiresIn`

Helper for specifying token lifetimes. Use with SessionOptions.expiresIn or generateConvoAIToken. Values are validated and capped at the Agora maximum of 86400 seconds (24 hours).

Value or method	Returns	Description
`ExpiresIn.DAY`	`86400`	24 hours — the Agora maximum and default
`ExpiresIn.hours(n)`	`number`	`n` hours in seconds. Throws if `n` ≤ 0, caps at 24 h with a warning
`ExpiresIn.minutes(n)`	`number`	`n` minutes in seconds. Throws if `n` ≤ 0, caps at 24 h with a warning

Types and enums

Shared types and enums used across AgoraClient, Agent, AgentSession, and vendor classes.

`Area`

Region used for API routing. Pass to AgoraClient via the area option.

import { Area } from 'agora-agent-sdk';

Value	Region
`Area.US`	United States
`Area.EU`	Europe
`Area.AP`	Asia-Pacific
`Area.CN`	China mainland

`AgoraAuthMode`

The resolved authentication mode on an AgoraClient instance. Read via client.authMode.

type AgoraAuthMode = "app-credentials" | "token" | "basic";

Value	Description
`"app-credentials"`	App ID and App Certificate provided. SDK auto-generates tokens
`"token"`	Pre-built `authToken` provided
`"basic"`	`customerId` and `customerSecret` provided

`AgentSessionEvent`

Union type of all valid event names for session.on() and session.off().

type AgentSessionEvent = "started" | "stopped" | "error";

`AgentSessionEventHandler`

Generic handler type for session event callbacks.

type AgentSessionEventHandler<T> = (data: T) => void;

Event	`T`
`"started"`	`{ agentId: string }`
`"stopped"`	`{ agentId: string }`
`"error"`	`Error`

`SpeakPriority`

Controls how the agent handles a say() call relative to its current activity.

type SpeakPriority = "INTERRUPT" | "APPEND" | "IGNORE";

Value	Description
`"INTERRUPT"`	Agent immediately stops current speech and delivers the message
`"APPEND"`	Message is queued and delivered after current speech ends
`"IGNORE"`	Message is discarded if the agent is currently speaking

`AgoraError`

Thrown when the API returns a 4xx or 5xx response. Catch this to inspect the status code and response body.

import { AgoraError } from 'agora-agent-sdk';
try {
  const agentId = await session.start();
} catch (err) {
  if (err instanceof AgoraError) {
    console.error('Status:', err.statusCode);
    console.error('Message:', err.message);
    console.error('Body:', err.body);
  }
}

Property	Type	Description
`statusCode`	`number`	HTTP status code returned by the API
`message`	`string`	Human-readable error message
`body`	`unknown`	Raw response body from the API
`rawResponse`	`Response`	The full HTTP response object