Python SDK API reference
Full API reference for the Agora Conversational AI Python SDK.
Agora / AsyncAgora Client
Agora (sync) and AsyncAgora (async) extend the Fern-generated base client with regional domain pool support and three authentication modes. Use Agora for synchronous applications and AsyncAgora for asyncio-based applications.
See Sync vs. Async to choose the right client for your application.
Constructor
Provide either app_id + app_certificate for app-credentials mode, or username + password for Basic Auth.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
area | Area | Yes | — | Region for API routing |
app_id | str | Yes* | — | Agora App ID |
app_certificate | str | Yes* | — | Agora App Certificate. Keep this secret and never expose it client-side |
username | str | Yes* | — | Customer ID for Basic Auth |
password | str | Yes* | — | Customer Secret for Basic Auth |
auth_token | str | No | — | Pre-built agora token=<value> string |
headers | Dict[str, str] | No | None | Additional headers sent with every request |
timeout | float | No | 60 | Request timeout in seconds |
follow_redirects | bool | No | True | Whether to follow HTTP redirects |
httpx_client | httpx.Client | No | None | Custom httpx client instance |
* Provide either app_id + app_certificate, or username + password.
Authentication mode is resolved from the parameters you provide:
| Parameters provided | Resolved mode |
|---|---|
app_id + app_certificate | "app-credentials" |
auth_token | "token" |
username + password | "basic" |
AsyncAgora has the same constructor signature, except httpx_client accepts httpx.AsyncClient instead of httpx.Client.
See Authentication for details on each mode.
Properties
The following read-only properties are available on both Agora and AsyncAgora instances.
| Property | Type | Description |
|---|---|---|
pool | Pool | The underlying domain pool instance used for regional routing |
Methods
The following methods are available in addition to the Fern-generated sub-client methods.
next_region()
Cycles to the next region prefix in the domain pool. Call this after a request failure to try a different regional endpoint. Synchronous on both Agora and AsyncAgora.
select_best_domain()
Triggers DNS-based domain selection to find the fastest-responding domain suffix. Results are cached for 30 seconds.
get_current_url()
Returns the full API URL currently in use. Synchronous on both Agora and AsyncAgora.
Sub-clients
Both Agora and AsyncAgora expose Fern-generated sub-clients for direct REST API access. You typically do not need these when using the agentkit layer.
| Property | Sync type | Async type | Description |
|---|---|---|---|
client.agents | AgentsClient | AsyncAgentsClient | Start, stop, update, speak, interrupt, get, list agents |
client.telephony | TelephonyClient | AsyncTelephonyClient | Telephony operations |
client.phone_numbers | PhoneNumbersClient | AsyncPhoneNumbersClient | Phone number management |
For full method signatures and request parameters, see the REST API reference.
Agent
Agent is an immutable configuration object. Each builder method returns a new Agent instance — the original is never modified. Define one Agent at startup and call create_session() on it for each user conversation.
Constructor
All parameters are optional. Use the builder methods to set vendor configuration after construction.
| Parameter | Type | Default | Description |
|---|---|---|---|
name | Optional[str] | None | Agent name, used as the default session name |
instructions | Optional[str] | None | LLM system prompt |
greeting | Optional[str] | None | First message spoken when the session starts |
failure_message | Optional[str] | None | Message spoken when an LLM call fails |
max_history | Optional[int] | None | Maximum conversation turns kept in LLM context |
turn_detection | Optional[TurnDetectionConfig] | None | Voice activity detection settings |
sal | Optional[SalConfig] | None | Selective Attention Locking configuration |
advanced_features | Optional[Dict[str, Any]] | None | Advanced features, for example {'enable_mllm': True} |
parameters | Optional[SessionParams] | None | Session parameters including silence and farewell config |
geofence | Optional[GeofenceConfig] | None | Regional access restriction |
labels | Optional[Dict[str, str]] | None | Custom key-value labels returned in notification callbacks |
rtc | Optional[RtcConfig] | None | RTC media encryption |
filler_words | Optional[FillerWordsConfig] | None | Filler words played while waiting for the LLM response |
Builder methods
All builder methods return a new Agent instance. The original is never modified.
with_llm(vendor)
Sets the LLM vendor for the cascading flow. Pass an instance of OpenAI, AzureOpenAI, Anthropic, or Gemini.
with_tts(vendor)
Sets the TTS vendor. Records the vendor's sample_rate for avatar validation.
with_stt(vendor)
Sets the STT vendor. Pass an instance of any STT vendor class.
with_mllm(vendor)
Sets the MLLM vendor for multimodal mode. Pass OpenAIRealtime or VertexAI. Requires advanced_features={'enable_mllm': True} in the constructor.
with_avatar(vendor)
Sets the avatar vendor. Raises ValueError if the TTS sample rate does not match the avatar's required rate.
Raises: ValueError — if TTS sample rate does not match the avatar's required_sample_rate.
with_turn_detection(config)
Configures voice activity detection. Use config.start_of_speech and config.end_of_speech for the SOS/EOS model.
with_instructions(instructions)
Overrides the LLM system prompt on a new Agent instance.
with_greeting(greeting)
Overrides the greeting message on a new Agent instance.
with_name(name)
Overrides the agent name on a new Agent instance.
Other builder methods
The following methods follow the same pattern — each returns a new Agent instance with the updated configuration.
| Method | Parameter type | Description |
|---|---|---|
with_sal(config) | SalConfig | Set Selective Attention Locking configuration |
with_advanced_features(features) | Dict[str, Any] | Set advanced features |
with_parameters(parameters) | SessionParams | Set session parameters |
with_failure_message(message) | str | Set the message spoken when the LLM fails |
with_max_history(n) | int | Set the maximum conversation history length |
with_geofence(geofence) | GeofenceConfig | Set geofence configuration |
with_labels(labels) | Dict[str, str] | Set custom labels |
with_rtc(rtc) | RtcConfig | Set RTC configuration |
with_filler_words(filler_words) | FillerWordsConfig | Set filler words configuration |
create_session()
Creates an AgentSession bound to a specific client and channel. Does not start the agent — call session.start() to join the channel.
create_session() fields:
| Parameter | Type | Required | Description |
|---|---|---|---|
client | Agora or AsyncAgora | Yes | Authenticated client. Pass AsyncAgora to receive an AsyncAgentSession |
channel | str | Yes | Channel name to join |
agent_uid | str | Yes | The agent's RTC UID |
remote_uids | List[str] | Yes | Remote user UIDs the agent listens and responds to |
name | Optional[str] | No | Session name. Defaults to agent name |
token | Optional[str] | No | Pre-built RTC+RTM token. Omit to auto-generate from app credentials |
expires_in | Optional[int] | No | Token lifetime in seconds. Only applies when the token is auto-generated. Valid range: 1–86400. Use expires_in_hours() for clarity |
idle_timeout | Optional[int] | No | Seconds before the agent auto-exits when no audio is detected |
enable_string_uid | Optional[bool] | No | Use string UIDs instead of numeric UIDs |
Properties
Read-only properties available on any Agent instance.
| Property | Type | Description |
|---|---|---|
name | Optional[str] | Agent name |
instructions | Optional[str] | LLM system prompt |
greeting | Optional[str] | Greeting message |
failure_message | Optional[str] | Message spoken when LLM fails |
max_history | Optional[int] | Maximum conversation history length |
llm | Optional[Dict[str, Any]] | LLM configuration |
tts | Optional[Dict[str, Any]] | TTS configuration |
stt | Optional[Dict[str, Any]] | STT configuration |
mllm | Optional[Dict[str, Any]] | MLLM configuration |
avatar | Optional[Dict[str, Any]] | Avatar configuration |
turn_detection | Optional[TurnDetectionConfig] | Turn detection configuration |
sal | Optional[SalConfig] | SAL configuration |
advanced_features | Optional[Dict[str, Any]] | Advanced features |
parameters | Optional[SessionParams] | Session parameters |
geofence | Optional[GeofenceConfig] | Geofence configuration |
labels | Optional[Dict[str, str]] | Custom labels |
rtc | Optional[RtcConfig] | RTC configuration |
filler_words | Optional[FillerWordsConfig] | Filler words configuration |
config | Dict[str, Any] | Full configuration snapshot |
AgentSession / AsyncAgentSession
AgentSession (sync) and AsyncAgentSession (async) manage the full lifecycle of a running agent. Obtain a session by calling agent.create_session() — direct construction is available for advanced use cases.
State machine
A session progresses through the following states:
| Transition | Trigger |
|---|---|
idle → starting | start() called |
starting → running | API responds with agent ID |
starting → error | API request fails |
running → stopping | stop() called |
stopping → stopped | API confirms agent stopped |
stopping → error | Stop request fails and agent was not already stopped |
running → error | Unrecoverable error during interaction |
start() can also be called from stopped or error state to restart the session.
Methods
The following methods are available on both AgentSession and AsyncAgentSession. Methods that make API calls require await on AsyncAgentSession.
start()
Starts the agent session. Generates an RTC token if not provided, validates avatar/TTS configuration, and calls the Agora API. Returns the agent ID.
| Sync | Async | |
|---|---|---|
| Signature | start() -> str | async start() -> str |
| Raises | RuntimeError if not in idle, stopped, or error state | Same |
| Raises | ValueError if avatar/TTS sample rate mismatch | Same |
stop()
Stops the agent session and removes the agent from the channel. If the agent has already stopped (for example due to idle timeout), transitions to stopped without raising.
| Sync | Async | |
|---|---|---|
| Signature | stop() -> None | async stop() -> None |
| Raises | RuntimeError if not in running state | Same |
say(text, priority=None, interruptable=None)
Instructs the agent to speak the given text.
| Sync | Async | |
|---|---|---|
| Signature | say(text, priority=None, interruptable=None) -> None | async say(...) -> None |
| Raises | RuntimeError if not in running state | Same |
| Parameter | Type | Required | Description |
|---|---|---|---|
text | str | Yes | The text for the agent to speak |
priority | str | No | Message priority: 'INTERRUPT', 'APPEND', or 'IGNORE'. See SpeakPriority |
interruptable | bool | No | Whether this message can be interrupted by the user |
interrupt()
Interrupts the agent's current speech.
| Sync | Async | |
|---|---|---|
| Signature | interrupt() -> None | async interrupt() -> None |
| Raises | RuntimeError if not in running state | Same |
update(properties)
Updates the agent configuration mid-session without restarting. Accepts a partial properties object in REST API format.
| Sync | Async | |
|---|---|---|
| Signature | update(properties: Any) -> None | async update(properties: Any) -> None |
| Raises | RuntimeError if not in running state | Same |
get_history()
Fetches the conversation history for this session. Requires a valid agent ID — start() must have been called successfully.
| Sync | Async | |
|---|---|---|
| Signature | get_history() -> Any | async get_history() -> Any |
| Raises | RuntimeError if no agent ID available | Same |
get_info()
Fetches current agent metadata from the API. Requires a valid agent ID.
| Sync | Async | |
|---|---|---|
| Signature | get_info() -> Any | async get_info() -> Any |
| Raises | RuntimeError if no agent ID available | Same |
get_turns()
Fetches turn-by-turn analytics for the session. Requires a valid agent ID.
| Sync | Async | |
|---|---|---|
| Signature | get_turns() -> Any | async get_turns() -> Any |
| Raises | RuntimeError if no agent ID available | Same |
on(event, handler)
Registers an event handler. Synchronous on both AgentSession and AsyncAgentSession. Register handlers before calling start() to avoid missing the started event.
| Parameter | Type | Description |
|---|---|---|
event | str | Event name: 'started', 'stopped', or 'error' |
handler | Callable | Callback function |
off(event, handler)
Removes a previously registered event handler. Synchronous on both AgentSession and AsyncAgentSession.
Events
The session emits the following events.
| Event | Payload type | Description |
|---|---|---|
'started' | dict with agent_id: str | Agent successfully joined the channel |
'stopped' | dict with agent_id: str | Agent left the channel |
'error' | Exception | An unrecoverable error occurred |
Properties
The following read-only properties are available on both AgentSession and AsyncAgentSession instances.
| Property | Type | Description |
|---|---|---|
id | Optional[str] | Agent ID, populated after start() resolves |
status | str | Current session state: 'idle', 'starting', 'running', 'stopping', 'stopped', 'error' |
agent | Agent | The agent configuration this session was created from |
app_id | str | The Agora App ID for this session |
raw | AgentsClient or AsyncAgentsClient | Direct access to the Fern-generated agents client for advanced operations |
Using session.raw
Use session.raw to call REST API endpoints not yet exposed by the agentkit layer.
Vendors
All vendor classes are imported from agora_agent.agentkit.vendors.
LLM vendors
Use with with_llm().
OpenAI
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | No | None | OpenAI API key. Omit to use Agora-managed credentials for supported models |
model | str | No | 'gpt-4o-mini' | Model name |
base_url | str | No | None | Custom base URL |
temperature | float | No | None | Sampling temperature (0.0–2.0) |
top_p | float | No | None | Nucleus sampling (0.0–1.0) |
max_tokens | int | No | None | Maximum tokens to generate |
max_history | int | No | None | Maximum conversation turns kept in context |
system_messages | List[Dict] | No | None | Additional system messages |
greeting_message | str | No | None | Agent greeting message |
failure_message | str | No | None | Message spoken when the LLM call fails |
input_modalities | List[str] | No | None | Input modalities |
output_modalities | List[str] | No | None | Output modalities |
greeting_configs | Dict[str, Any] | No | None | Greeting configuration |
template_variables | Dict[str, str] | No | None | Template variables for system prompt interpolation |
vendor | str | No | None | Override the vendor identifier |
mcp_servers | List[Dict] | No | None | MCP server configurations |
params | Dict[str, Any] | No | None | Additional model parameters |
AzureOpenAI
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | Azure OpenAI API key |
endpoint | str | Yes | — | Azure endpoint URL |
deployment_name | str | Yes | — | Azure deployment name |
api_version | str | No | '2024-08-01-preview' | Azure API version |
temperature | float | No | None | Sampling temperature (0.0–2.0) |
top_p | float | No | None | Nucleus sampling (0.0–1.0) |
max_tokens | int | No | None | Maximum tokens to generate |
system_messages | List[Dict] | No | None | Additional system messages |
greeting_message | str | No | None | Agent greeting message |
failure_message | str | No | None | Message spoken when the LLM call fails |
input_modalities | List[str] | No | None | Input modalities |
Anthropic
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | Anthropic API key |
model | str | No | 'claude-3-5-sonnet-20241022' | Model name |
max_tokens | int | No | None | Maximum tokens to generate |
temperature | float | No | None | Sampling temperature (0.0–1.0) |
top_p | float | No | None | Nucleus sampling (0.0–1.0) |
system_messages | List[Dict] | No | None | Additional system messages |
greeting_message | str | No | None | Agent greeting message |
failure_message | str | No | None | Message spoken when the LLM call fails |
input_modalities | List[str] | No | None | Input modalities |
Gemini
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | Google AI API key |
model | str | No | 'gemini-2.0-flash-exp' | Model name |
temperature | float | No | None | Sampling temperature (0.0–2.0) |
top_p | float | No | None | Nucleus sampling (0.0–1.0) |
top_k | int | No | None | Top-k sampling |
max_output_tokens | int | No | None | Maximum output tokens |
system_messages | List[Dict] | No | None | Additional system messages |
greeting_message | str | No | None | Agent greeting message |
failure_message | str | No | None | Message spoken when the LLM call fails |
input_modalities | List[str] | No | None | Input modalities |
TTS vendors
Use with with_tts(). The sample_rate option determines avatar compatibility — see with_avatar().
ElevenLabsTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | ElevenLabs API key |
model_id | str | Yes | — | Model ID, for example 'eleven_flash_v2_5' |
voice_id | str | Yes | — | Voice ID |
sample_rate | int | No | None | Sample rate in Hz: 16000, 22050, 24000, or 44100 |
base_url | str | No | None | Custom WebSocket base URL |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
optimize_streaming_latency | int | No | None | Latency optimization level (0–4) |
stability | float | No | None | Voice stability (0.0–1.0) |
similarity_boost | float | No | None | Similarity boost (0.0–1.0) |
style | float | No | None | Style exaggeration (0.0–1.0) |
use_speaker_boost | bool | No | None | Enable speaker boost |
MicrosoftTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | Azure subscription key |
region | str | Yes | — | Azure region, for example 'eastus' |
voice_name | str | Yes | — | Voice name, for example 'en-US-JennyNeural' |
sample_rate | int | No | None | Sample rate in Hz: 8000, 16000, 24000, or 48000 |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
OpenAITTS
Fixed sample rate: 24000 Hz.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | No | None | OpenAI API key. Omit to use Agora-managed credentials for supported models |
voice | str | Yes | — | Voice name: 'alloy', 'echo', 'fable', 'onyx', 'nova', or 'shimmer' |
model | str | No | None | Model: 'tts-1' or 'tts-1-hd' |
response_format | str | No | None | Audio format, for example 'pcm' |
speed | float | No | None | Speech speed multiplier |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
CartesiaTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | Cartesia API key |
voice_id | str | Yes | — | Voice ID |
model_id | str | No | None | Model ID |
sample_rate | int | No | None | Sample rate in Hz: 8000–48000 |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
GoogleTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | Google Cloud API key |
voice_name | str | Yes | — | Voice name |
language_code | str | No | None | Language code, for example 'en-US' |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
AmazonTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
access_key | str | Yes | — | AWS access key |
secret_key | str | Yes | — | AWS secret key |
region | str | Yes | — | AWS region, for example 'us-east-1' |
voice_id | str | Yes | — | Amazon Polly voice ID |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
HumeAITTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | Hume AI API key |
config_id | str | No | None | Configuration ID |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
RimeTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | Rime API key |
speaker | str | Yes | — | Speaker ID |
model_id | str | No | None | Model ID |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
FishAudioTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | Fish Audio API key |
reference_id | str | Yes | — | Reference ID |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
MiniMaxTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | No | None | MiniMax API key. Omit to use Agora-managed credentials for supported models |
model | str | Yes | — | Model name, for example 'speech-2.6-turbo' |
voice_id | str | No | None | Voice style identifier |
group_id | str | No | None | MiniMax group ID |
url | str | No | None | WebSocket endpoint |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
MurfTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | Murf API key |
voice_id | str | Yes | — | Voice ID, for example 'Ariana' or 'Natalie' |
style | str | No | None | Voice style, for example 'Conversational' |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
SarvamTTS
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | Sarvam API key |
speaker | str | Yes | — | Speaker name |
target_language_code | str | Yes | — | Target language code |
skip_patterns | List[int] | No | None | Skip patterns for bracketed content |
STT vendors
Use with with_stt().
DeepgramSTT
All parameters are optional.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | No | None | Deepgram API key. Omit to use Agora-managed credentials for supported models |
model | str | No | None | Model name, for example 'nova-2' |
language | str | No | None | Language code, for example 'en-US' |
smart_format | bool | No | None | Enable smart formatting |
punctuation | bool | No | None | Enable punctuation |
additional_params | Dict[str, Any] | No | None | Additional vendor parameters |
SpeechmaticsSTT
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | Speechmatics API key |
language | str | Yes | — | Language code, for example 'en' |
additional_params | Dict[str, Any] | No | None | Additional parameters |
MicrosoftSTT
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
key | str | Yes | — | Azure subscription key |
region | str | Yes | — | Azure region, for example 'eastus' |
language | str | No | None | Language code, for example 'en-US' |
additional_params | Dict[str, Any] | No | None | Additional parameters |
OpenAISTT
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | OpenAI API key |
model | str | No | None | Model name. Default: 'whisper-1' |
language | str | No | None | Language code |
additional_params | Dict[str, Any] | No | None | Additional parameters |
GoogleSTT
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | Google Cloud API key |
language | str | No | None | Language code, for example 'en-US' |
additional_params | Dict[str, Any] | No | None | Additional parameters |
AmazonSTT
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
access_key | str | Yes | — | AWS access key ID |
secret_key | str | Yes | — | AWS secret access key |
region | str | Yes | — | AWS region, for example 'us-east-1' |
language | str | No | None | Language code |
additional_params | Dict[str, Any] | No | None | Additional parameters |
AssemblyAISTT
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | AssemblyAI API key |
language | str | No | None | Language code |
additional_params | Dict[str, Any] | No | None | Additional parameters |
AresSTT
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
language | str | No | None | Language code |
additional_params | Dict[str, Any] | No | None | Additional parameters |
SonioxSTT
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | Soniox API key |
language | str | Yes | — | Language code, for example 'en' |
additional_params | Dict[str, Any] | No | None | Additional parameters |
SarvamSTT
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | Sarvam API key |
language | str | Yes | — | Language code, for example 'en' or 'hi' |
additional_params | Dict[str, Any] | No | None | Additional parameters |
MLLM vendors
Use with with_mllm() for multimodal end-to-end audio processing without separate STT or TTS steps. Requires advanced_features={'enable_mllm': True} in the Agent constructor.
OpenAIRealtime
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | OpenAI API key |
model | str | No | None | Model name, for example 'gpt-4o-realtime-preview' |
url | str | No | None | Custom WebSocket URL |
greeting_message | str | No | None | Agent greeting message |
input_modalities | List[str] | No | None | Input modalities, for example ['audio'] |
output_modalities | List[str] | No | None | Output modalities, for example ['text', 'audio'] |
messages | List[Dict] | No | None | Conversation messages for short-term memory |
params | Dict[str, Any] | No | None | Additional parameters |
VertexAI
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | str | Yes | — | Model name, for example 'gemini-2.0-flash-exp' |
project_id | str | Yes | — | Google Cloud project ID |
location | str | Yes | — | Google Cloud location, for example 'us-central1' |
adc_credentials_string | str | Yes | — | Application Default Credentials JSON string |
instructions | str | No | None | System instructions for the model |
voice | str | No | None | Voice name, for example 'Aoede' or 'Charon' |
greeting_message | str | No | None | Agent greeting message |
input_modalities | List[str] | No | None | Input modalities |
output_modalities | List[str] | No | None | Output modalities |
messages | List[Dict] | No | None | Conversation messages for short-term memory |
additional_params | Dict[str, Any] | No | None | Additional parameters |
Avatar vendors
Use with with_avatar(). Each avatar vendor requires a specific TTS sample rate enforced at runtime.
HeyGenAvatar
Requires TTS at 24,000 Hz.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | HeyGen API key |
quality | str | Yes | — | Video quality: 'low', 'medium', or 'high' |
agora_uid | str | Yes | — | Agora UID for the avatar video stream |
agora_token | str | No | None | RTC token for avatar authentication |
avatar_id | str | No | None | HeyGen avatar ID |
enable | bool | No | True | Enable or disable the avatar |
disable_idle_timeout | bool | No | None | Disable the idle timeout |
activity_idle_timeout | int | No | None | Idle timeout in seconds. Default: 120 |
AkoolAvatar
Requires TTS at 16,000 Hz.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | Yes | — | Akool API key |
agora_uid | str | Yes | — | Agora UID for the avatar video stream |
avatar_id | str | No | None | Avatar ID |
enable | bool | No | None | Enable or disable the avatar |
Token utilities
Helper functions for generating and managing tokens. Use these when you need control over token lifetime or when generating tokens outside of a session.
generate_convo_ai_token()
Generates a Conversational AI token combining RTC and RTM privileges. This is the same token the SDK generates automatically in app-credentials mode. Use this when passing a pre-built token to create_session().
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
app_id | str | Yes | — | Agora App ID |
app_certificate | str | Yes | — | Agora App Certificate |
channel_name | str | Yes | — | The channel the token grants access to |
account | str | Yes | — | The UID this token is issued for, as a string |
token_expire | int | No | 86400 | Token lifetime in seconds. Valid range: 1–86400 |
privilege_expire | int | No | 0 | Seconds until privileges expire. 0 means same as token_expire |
Returns: str — the generated token.
generate_rtc_token()
Generates an RTC-only token for channel join. Use generate_convo_ai_token() instead for most Conversational AI use cases.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
app_id | str | Yes | — | Agora App ID |
app_certificate | str | Yes | — | Agora App Certificate |
channel | str | Yes | — | Channel name |
uid | int | Yes | — | User ID. Use 0 for any user |
role | int | No | 1 | RTC role: 1 for publisher, 2 for subscriber |
expiry_seconds | int | No | 86400 | Token lifetime in seconds. Valid range: 1–86400 |
Returns: str — the generated token.
expires_in_hours() / expires_in_minutes()
Helper functions for specifying token lifetimes. Use with create_session() or token generation functions. Values are validated and capped at the Agora maximum of 86400 seconds (24 hours).
| Function | Returns | Behavior |
|---|---|---|
expires_in_hours(n) | int — seconds | Raises ValueError if n ≤ 0. Warns and caps at 86400 if result exceeds 24 h |
expires_in_minutes(n) | int — seconds | Raises ValueError if n ≤ 0. Warns and caps at 86400 if result exceeds 24 h |
Types and enums
Shared types and enums used across Agora, AsyncAgora, Agent, AgentSession, and vendor classes.
Area
Region used for API routing. Pass to Agora or AsyncAgora via the area parameter.
| Value | Region |
|---|---|
Area.US | United States |
Area.EU | Europe |
Area.AP | Asia-Pacific |
Area.CN | China mainland |
AgentSessionEvent
Valid event names for session.on() and session.off().
| Value | Payload | Description |
|---|---|---|
'started' | dict with agent_id: str | Agent successfully joined the channel |
'stopped' | dict with agent_id: str | Agent left the channel |
'error' | Exception | An unrecoverable error occurred |
SpeakPriority
Controls how the agent handles a say() call relative to its current activity. Pass as a string to session.say().
| Value | Description |
|---|---|
'INTERRUPT' | Agent immediately stops current speech and delivers the message |
'APPEND' | Message is queued and delivered after current speech ends |
'IGNORE' | Message is discarded if the agent is currently speaking |
ApiError
Raised when the API returns a 4xx or 5xx response. Catch this to inspect the status code and response body.
| Property | Type | Description |
|---|---|---|
status_code | int | HTTP status code returned by the API |
body | Any | Raw response body from the API |
Sync vs. Async
The Python SDK provides two parallel client and session hierarchies — synchronous and asynchronous. Choose based on your application's runtime model.
| Sync | Async | |
|---|---|---|
| Client | Agora | AsyncAgora |
| Session | AgentSession | AsyncAgentSession |
| HTTP backend | httpx.Client | httpx.AsyncClient |
Use Agora (sync) when:
- You are writing scripts, CLI tools, or batch jobs
- Your web framework is synchronous, for example Flask or Django without async views
- You want the simplest possible code
Use AsyncAgora (async) when:
- Your application uses
asyncio, for example FastAPI, Starlette, or aiohttp - You need to manage multiple concurrent agent sessions efficiently
- You want non-blocking I/O
The Agent builder class is the same for both — it does not make HTTP calls, so it has no async variant. Pass an AsyncAgora client to agent.create_session() to receive an AsyncAgentSession.