Release notes
This document tracks important changes and improvements to the Conversational AI Engine.
Releases
v2.0
Released on November 15, 2025.
New features
Included in this release:
-
Selective attention locking (Beta)
This version adds the selective attention lock feature. Register voiceprints to enable the agent to identify specific speakers and suppress background voices and environmental noise, ensuring clearer, more focused conversations.
-
Graceful exit
This version adds a new
farewell_configfield to Start a conversational AI agent API to configure the graceful exit feature. When enabled, calling the Stop a Conversational Agent API causes the agent to enter anIDLEstate before leaving the channel. -
Keyword interruption mode
This release adds a new option to the
turn_detection.interrupt_modefield in the Start a conversational AI agent API. Set this field to"keyword"to enable keyword interruption mode.When this mode is enabled, the agent stops its current behavior after detecting any of the keywords specified in the
turn_detection.interrupt_keywordsfield. -
Adaptive interruption mode
This release adds a new option to the
turn_detection.interrupt_modefield in the Start a conversational AI agent API. Set this field to"adaptive"to enable adaptive interruption mode.When this mode is enabled, the agent dynamically increases the voice continuity threshold while speaking to reduce accidental interruptions.
-
New ASR, LLM, MLLM, and TTS providers
-
New webhook notification events
This release adds three new webhook notification event types to support metrics reporting and call-state monitoring:
-
111: agent metricsNotifies real-time performance metrics for each dialogue turn, including ASR, LLM, and TTS latency measurements.
-
201: inbound call stateReports state changes for incoming calls, such as when a call starts, is answered, transferred, or hung up.
-
202: outbound call stateReports state changes for outbound calls initiated by the agent, including call start, dialing, ringing, answer, and hang-up events.
-
Improvements
This release includes the following enhancements:
-
Support for avatars with MLLMs
Added support for using avatars with MLLMs.
API changes
This release introduces the following changes to the RESTful API.
- Changes to Start a conversational AI agent
-
New parameters added
properties.parameters.farewell_configproperties.advanced_features.enable_salproperties.salproperties.sal.sal_modeproperties.sal.sample_urlsproperties.turn_detection.interrupt_mode(supportsadaptiveandkeywordvalues)properties.turn_detection.interrupt_keywordsproperties.turn_detection.interrupt_duration_ms(migrated fromvad.interrupt_duration_ms)properties.turn_detection.prefix_padding_ms(migrated fromvad.prefix_padding_ms)properties.turn_detection.silence_duration_ms(migrated fromvad.silence_duration_ms)properties.turn_detection.threshold(migrated fromvad.threshold)
-
Deprecated
- The
vadinterface is deprecated. All configuration items have been moved to theturn_detectionfield.
- The
-
Toolkit API
This release renames all APIs and parameters containing the word transcription in the client-side subtitle API to use transcript, as shown below:
- Android
- iOS
- Web
onTranscriptionUpdatedrenamed toonTranscriptUpdatedTranscriptionRenderModerenamed toTranscriptRenderModeTranscriptionTyperenamed toTranscriptTypeTranscriptionStatusrenamed toTranscriptStatusTranscriptionrenamed toTranscript
onTranscriptionUpdatedrenamed toonTranscriptUpdatedTranscriptionRenderModerenamed toTranscriptRenderModeTranscriptionTyperenamed toTranscriptTypeTranscriptionStatusrenamed toTranscriptStatusTranscriptionrenamed toTranscript
TRANSCRIPTION_UPDATEDrenamed toTRANSCRIPT_UPDATED
v1.7
Released on July 31, 2025.
New features
-
AI avatars
Create visual avatar representations for your conversational agents using third-party avatar providers. AI avatars provide a visual presence during voice interactions, making conversations feel more natural and engaging. Enable AI avatars by setting
avatar.enabletotrueand configuring theavatar.vendorandavatar.paramsfields when calling Start a conversational AI agent to create your agent.infoAI avatars require video streaming and incur additional charges. See video calling pricing for details.
-
Selective attention locking (Beta)
This version introduces the selective attention locking feature, which uses voiceprint recognition technology to identify and filter out the speaker while suppressing background noise. This enhances the efficiency of conversational AI, particularly improving speech recognition accuracy. To experience this feature, contact technical support.
-
Send picture messages (Beta)
The toolkit now includes an API for sending picture messages. You can send image URLs to the main model, which automatically references the image in future interactions to generate more relevant responses. A new callback is available to receive image message receipt details after successful transmission.
info- The picture messaging feature is currently in Beta and free for a limited time.
- Image processing depends on the capabilities of the integrated LLM. Ensure the LLM you connect to the Conversational AI Engine supports image input.
API changes
This release introduces the following modifications to the RESTful API.
- Start a conversational AI agent
- New parameters added
avatar.enableavatar.vendoravatar.params
- New parameters added
Toolkit API
- Android:
chatImageMessageonMessageReceiptUpdatedMessageReceipt
- iOS:
chatChatMessageChatMessageTypeImageMessageonMessageReceiptUpdatedMessageReceipt
- Web:
chatTMessageReceiptEChatMessagePriorityEChatMessageTypeIChatMessageBaseIChatMessageImage
v1.6
Released on July 15, 2025.
New features
-
Support for OpenAI realtime API
Integrate Multimodal Large Language Models (MLLMs) with Conversational AI Engine to enable end-to-end real-time audio and text interactions. See OpenAI Realtime API for integration details.
-
Support for more TTS vendors
Conversational AI Engine now supports the following additional TTS vendors:
-
Custom ASR provider support
To improve flexibility in configuring conversational agents, this release allows you to select a custom automatic speech recognition (ASR) provider. The Start a conversational AI agent API now includes the following new parameters:
asr.vendor: Specify the ASR providerasr.params: Configure ASR parameters
The following ASR providers are supported:
- ARES (default)
- Microsoft Azure
- Deepgram
Billing update:
In earlier versions, the service fee included the cost of the Ares ASR provider. Starting in v1.6, the pricing is restructured as follows:- If you use ARES ASR, the total price remains unchanged:
Total cost = Conversational AI Engine Audio Basic Task + ARES ASR Task - If you use a different ASR provider, you are charged only the new Conversational AI Engine Audio Basic Task fee.
For further details, see Pricing.
-
Multi-platform toolkit
Agora now offers a toolkit to help you quickly build conversational agent apps. The toolkit is available for iOS, Android, and Web, and includes APIs for common scenarios. Call these APIs to combine the capabilities of the Agora Voice SDK and Signaling SDK to achieve the following functions:
-
Display live transcript Display real-time text output of user–agent conversations. The transcript component is now more robust, with better error handling, session management, and extensibility.
-
Interrupt the agent Stop the agent from speaking or thinking mid-conversation.
-
Receive event notifications Track changes in conversation state, performance metrics, and error events.
-
Optimize audio settings Quickly apply best-practice audio configurations to improve agent responsiveness and clarity.
-
API changes
REST API
This release introduces several important modifications to the RESTful API.
- Start a conversational AI agent
- New parameters added
asr.vendorasr.paramsadvanced_features.enable_mllmproperties.mllmturn_detection.typeturn_detection.interrupt_duration_msturn_detection.prefix_padding_msturn_detection.silence_duration_msturn_detection.thresholdturn_detection.create_responseturn_detection.interrupt_responseturn_detection.eagernessparameters.enable_metricsparameters.data_channelparameters.enable_error_message
- New parameters added
Toolkit APIs
v1.5
Released on Jun 9, 2025.
New features
-
Voice interruption mode
This release adds the
turn_detection.interrupt_modeparameter to the Start a conversational AI agent API, allowing you to control how the agent handles human voice interruptions. The following modes are supported:-
interrupt: (Default) The human voice immediately interrupts the agent. The agent terminates the current interaction and processes the new human voice input. -
append: The human voice does not interrupt the agent. The agent processes the newly received human voice request after the current interaction ends. -
ignore: The agent ignores human voice requests received during speaking or thinking. These requests are discarded and not stored in the context.
-
-
TTS filtering
This release adds the
tts.skip_patternsparameter to the Start a conversational AI agent API. This parameter controls whether the TTS module skips bracketed content when reading LLM response text. This prevents the agent from vocalizing structural prompt information like tone indicators, action descriptions, and system prompts, creating a more natural and immersive listening experience.
API changes
This release introduces several important modifications to the RESTful API.
- Start a conversational AI agent
- New parameters added
turn_detection.interrupt_modeparameters.silence_configtts.skip_patterns
- New parameters added
v1.4
Released on May 29, 2025.
New features
-
Metadata support for LLM requests
This release adds the
llm.vendorparameter to the Start a conversational AI agent API. When set to"custom", the agent includes additional metadata when calling the LLM, such asturn_idandtimestamp. -
Support for Anthropic
Conversational AI Engine now supports
anthropicas a request style for chat completion. Refer to thellm.styleparameter in Start a conversational AI agent.
Improvements
This release includes the following enhancements:
- Advanced LLM configuration: The Update agent configuration API now supports:
llm.system_messagesfor updating system promptsllm.paramsfor modifying configuration parameters used when calling the large language model
API changes
This release introduces several important modifications to the RESTful API.
-
Start a conversational AI agent
- New parameters added
llm.vendor
- Removed parameters:
agent_rtm_uid
- New parameters added
-
- New parameters added
llm.system_messagesllm.params
- New parameters added
v1.3
Released on April 16, 2025.
New features
-
Agent conversation history: This version adds two methods to retrieve an agent’s history. The history includes messages exchanged between the user and the agent and timestamps of agent creation and exit.
-
Call the RESTful API
historyendpoint to Retrieve agent history. -
Subscribe to the agent history event through the Agora message notification service. When the agent stops, Agora automatically sends the agent's history to your business server through a Webhook callback.
-
Improvements
-
Customize the priority of broadcast information: This version upgrades the Broadcast a message using TTS interface and adds two new configuration parameters related to broadcast interruption logic:
-
priority: Sets the priority of the message broadcast. Supports setting the following priorities:INTERRUPTHigh priorityAPPEND: Medium priorityIGNORE: Low priority
-
interruptable: Configure whether to allow human voice to interrupt the agent's broadcast.
-
API changes
- Adds the Retrieve agent history method.
- Adds
priorityandinterruptableparameters to the Broadcast a message using TTS method.
v1.2
Released on April 10, 2025.
New features
-
Broadcast a message using TTS: A new message broadcast interface enables a specified agent to deliver a custom message. When interacting with an agent, calling this interface interrupts the agent’s speech and thinking process, allowing the TTS module to immediately broadcast the custom message.
-
Interrupt the agent: The interrupt agent endpoint allows you to stop the specified agent’s speech and thinking process.
API changes
This version adds the following APIs:
v1.1
Released on March 27, 2025.
New features
The Start a conversational agent API adds the enable_rtmand agent_rtm_uid parameters to enable Signaling integration with Conversational AI agent. When this feature is enabled, the agent can leverage the Signaling SDK to obtain a users's custom context information such as speaking status, selected text, signature, and score, and pass this data to the agent to generate more relevant content. For details, see Transmit custom information.
Improvements
To help you quickly integrate a custom large language model (LLM), this version adds documentation for Custom LLMs. Refer to the sample code in the documentation to integrate your custom model into the Conversational AI Engine and enable advanced capabilities such as Retrieval-Augmented Generation (RAG), multi-modal processing, and tool invocation.
API changes
The POST method to Start a Conversational AI agent now includes the enable_rtm and agent_rtm_uid parameters.
v1.0 (Public Beta)
This version, released on March 4, 2025, adds pricing information for the Agora Conversational AI Engine. For more information, see Pricing.
Integration guide
To achieve the best conversation experience, use Agora Conversational AI Engine with the following Agora SDKs:
- Agora RTC Native SDK, v4.5.1 or later.
- Agora RTC Web SDK, version 4.23.2 or later.
New features
-
Live transcripts: Supports real-time text output of conversations between users and the AI agent for transcript display in your app's UI. Agora provides an open-source transcript processing module. Simply integrate the module and call its API to implement live transcript. For details, see Display live transcripts.
-
Message Notification Service: Introduces a new Conversational AI Engine message notification service. Configure it in the Agora console and subscribe to agent creation, stop, and error events. When a subscribed event occurs, Agora sends the details to your specified callback address. See Receive event notifications.
-
Keywords: Enhances recognition accuracy of Conversational AI Engine for proprietary words by adding keywords. This feature is currently in Beta stage. For details, contact technical support.
v1.0 (Private Beta)
Released on February 18, 2025. The first beta release of the Conversational AI Engine brings natural, smooth, low-latency, and highly reliable real-time voice conversations with AI agents to Agora channels. It enables you to efficiently build intelligent and immersive interactive experiences. See Product overview for details.
Core Features
-
Real-time voice conversation
Supports natural and smooth real-time voice conversations with AI. It delivers a low-latency, ultra-responsive interactive experience as if the user is communicating with a real person.
-
Intelligent noise suppression
Intelligently identifies and suppresses background noise, ensuring clear sound transmission even in noisy environments to provide users with a high-quality audio experience.
-
Background human voice suppression
Suppresses background voices and noise while accurately preserving the primary speaker's voice. This ensures a clear and focused interactive experience in multi-speaker environments.
-
Intelligent interruption handling
Allows users to interrupt AI at any time to ensure quick and natural responses. This feature enables smooth transitions and avoids mechanical interactions.
-
Intelligent transmission
An AI-optimized transmission algorithm ensures stable voice data delivery even in weak network conditions where packet loss reaches 80%. This guarantees conversation continuity and reliability across diverse network environments.
-
Flexible arrangement
Supports multiple Large Language Model (LLM) and Text-to-Speech (TTS) providers, enabling flexible orchestration to meet diverse business needs and deliver highly customizable AI dialogue solutions.
-
Multi-platform support
Compatible with iOS, Android, Web, and various embedded hardware platforms, providing a seamless and consistent cross-platform experience.
Integration guide
-
For the best conversational experience, Agora recommends using Conversational AI Engine with specific Agora Video/Voice SDK versions. For details, contact technical support.
-
The number of Peak Concurrent Users (PCU) allowed to call the server API under a single App ID is limited to 20. If you need to increase this limit, please contact technical support.