Display live subtitles
When interacting with conversational AI in real time, you can enable real-time subtitles to display the conversation content. This page explains how to implement real-time subtitles in your app.
Understand the tech
Agora provides a flexible, scalable, and standardized conversational AI engine toolkit. The toolkit supports iOS, Android, and Web platforms, and encapsulates scenario-based APIs. You can use these APIs to integrate the capabilities of the Agora Signaling SDK and Video SDK to enable the following features:
- Interrupt agents
- Display live subtitles
- Receive event notifications
- Set optimal audio parameters (iOS and Android only)
- Send picture messages
The toolkit receives subtitle transcription content through the onTranscriptionUpdated
callback and supports monitoring the following types of subtitle data:
-
Agent captions: Transcribes the agent’s speech. Includes real-time updates and final results.
-
User captions: Transcribes the user’s speech. Supports real-time display and status management.
-
Transcription status: Reports status updates such as in progress, completed, or interrupted.
The following diagram outlines the step-by-step process to integrate live subtitle functionality into your application:
Subtitles rendering workflow
Prerequisites
Before you begin, ensure the following:
- You have implemented the Conversational AI Engine REST quickstart.
- Your app integrates Video SDK v4.5.1 or later and includes the video quickstart implementation.
- You have enabled Signaling in the Agora Console and completed Signaling quickstart for basic messaging.
- You maintain active and authenticated RTC and Signaling instances that persist beyond the component's lifecycle. The toolkit does not manage the initialization, lifecycle, or authentication of RTC or Signaling.
Implementation
This section describes how to receive subtitle content from the subtitle processing module and display it on your app UI.
- Android
- iOS
- Web
-
Integrate the toolkit
Copy the
convoaiApi
folder to your project and import the toolkit before calling the toolkit API. Refer to Folder structure to understand the role of each file. -
Create a toolkit instance
Create a configuration object with the Video SDK and Signaling engine instances. Set the subtitle rendering mode, then use the configuration to create a toolkit instance.
// Create configuration objects for the RTC and RTM instances val config = ConversationalAIAPIConfig( rtcEngine = rtcEngineInstance, rtmClient = rtmClientInstance, // Set the transcription subtitle rendering mode. Options: // - TranscriptionRenderMode.Word: Renders subtitles word by word. // - TranscriptionRenderMode.Text: Renders the full sentence at once. renderMode = TranscriptionRenderMode.Word, enableLog = true ) // Create component instance val api = ConversationalAIAPIImpl(config)
-
Subscribe to the channel
Subtitles are delivered through Signaling channel messages. To receive subtitle data, call
subscribeMessage
before starting the agent session.api.subscribeMessage("channelName") { error -> if (error != null) { // Handle error } }
-
Receive subtitles
Call the
addHandler
method to register your implementation of the subtitle transcription callback.api.addHandler(covEventHandler)
-
Implement subtitle UI rendering logic
Inherit your subtitle UI module from the
IConversationSubtitleCallback
interface. Implement theonTranscriptionUpdated
method to handle the logic for rendering subtitles to the UI.private val covEventHandler = object : IConversationalAIAPIEventHandler { override fun onTranscriptionUpdated(agentUserId: String, transcription: Transcription) { // Handle subtitle data and update the UI here } }
-
Add a Conversational AI agent to the channel
To start a Conversational AI agent, configure the following parameters in your
POST
request:Parameter Description Required advanced_features.enable_rtm: true
Starts the Signaling service Yes parameters.data_channel: "rtm"
Enables Signaling as the data transmission channel Yes parameters.enable_metrics: true
Enables agent performance data collection Optional parameters.enable_error_message: true
Enables reporting of agent error events Optional After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.
-
Unsubscribe from the channel
After an agent session ends, unsubscribe from channel messages to release subtitle-related resources:
api.unsubscribeMessage("channelName") { error ->
if (error != null) {
// Handle the error
}
} -
Release resources
At the end of each call, use the
destroy
method to clean up the cache.api.destroy()
-
Integrate the toolkit
Copy the
ConversationalAIAPI
folder to your project and import the toolkit before calling the toolkit APIs. Refer to Folder structure to understand the role of each file. -
Create a toolkit instance
Create a configuration object with the Video SDK and Signaling engine instances. Set the subtitle rendering mode, then use the configuration to create a toolkit instance.
// Create a configuration object for the RTC and RTM instances let config = ConversationalAIAPIConfig( rtcEngine: rtcEngine, rtmEngine: rtmEngine, /** * Set the subtitle rendering mode. Available options: * - .words: Word-by-word rendering mode. The subtitle content received from the callback * is rendered to the UI one word at a time. * - .text: Sentence-by-sentence rendering mode. The full subtitle content from the callback * is rendered to the UI at once. */ renderMode: .words, enableLog: true ) // Create the component instance convoAIAPI = ConversationalAIAPIImpl(config: config)
-
Subscribe to the channel
Subtitles are delivered through Signaling channel messages. To receive subtitle data, call
subscribeMessage
before starting the agent session.convoAIAPI.subscribeMessage(channelName: channelName) { error in if let error = error { print("Subscription failed: \(error.message)") } else { print("Subscription successful") } }
-
Receive subtitles
Call the
addHandler
method to register and implement the subtitle transcription callback:convoAIAPI.addHandler(handler: self)
-
Implement subtitle UI rendering logic
Implement the
ConversationalAIAPIEventHandler
protocol in your subtitle UI module, and use theonTranscriptionUpdated
method to handle and render subtitles to the UI.extension ChatViewController: ConversationalAIAPIEventHandler { public func onTranscriptionUpdated(agentUserId: String, transcription: Transcription) { // Handle subtitle data and update the UI here } }
-
Add a Conversational AI agent to the channel
To start a Conversational AI agent, configure the following parameters in your
POST
request:Parameter Description Required advanced_features.enable_rtm: true
Starts the Signaling service Yes parameters.data_channel: "rtm"
Enables Signaling as the data transmission channel Yes parameters.enable_metrics: true
Enables agent performance data collection Optional parameters.enable_error_message: true
Enables reporting of agent error events Optional After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.
-
Unsubscribe from the channel
After each agent session ends, unsubscribe from channel messages to release subtitle-related resources.
// Unsubscribe from channel messages convoAIAPI.unsubscribeMessage(channelName: channelName) { error in if let error = error { print("Unsubscription failed: \(error.message)") } else { print("Unsubscribed successfully") } }
-
Release resources
At the end of each call, use the
destroy
method to clean up the cache.convoAIAPI.destroy()
-
Integrate the toolkit
Copy the
conversational-ai-api
file to your project and import the toolkit before calling its API. Refer to Folder structure to understand the role of each file. -
Create a toolkit instance
Before joining a Video SDK channel, create video and Signaling engine instances and pass in to the toolkit instance.
// Initialize the component ConversationalAIAPI.init({ rtcEngine, rtmEngine, /** * Set the rendering mode for transcription subtitles. Available options: * - ESubtitleHelperMode.WORD: Word-by-word rendering mode. The subtitle content received from the callback * is rendered to the UI one word at a time. * - ESubtitleHelperMode.TEXT: Sentence-by-sentence rendering mode. The full subtitle content from the callback * is rendered to the UI at once. * * If not specified, the mode is determined automatically based on the message, or it can be set manually. */ renderMode: ESubtitleHelperMode.WORD, }) // Get the API instance (singleton) const conversationalAIAPI = ConversationalAIAPI.getInstance()
-
Subscribe to the channel Agent-related events are delivered through Signaling messages. Before starting an agent session, call
subscribeMessage
to receive these events:conversationalAIAPI.subscribeMessage(channel_name)
-
Receive subtitles
Register an event listener to receive subtitle transcription updates:
import * as React from "react" import { type IUserTranscription, type IAgentTranscription, type ISubtitleHelperItem, EConversationalAIAPIEvents, } from "@/conversational-ai-api/type" import { ConversationalAIAPI } from "@/conversational-ai-api" // Listen for transcription content updates to display subtitle transcription content in real time export const ChatHistory = () => { const [chatHistory, setChatHistory] = React.useState ISubtitleHelperItem<Partial<IUserTranscription | IAgentTranscription>>[] >([]) const conversationalAIAPI = ConversationalAIAPI.getInstance() conversationalAIAPI.on( EConversationalAIAPIEvents.TRANSCRIPTION_UPDATED, setChatHistory ) return ( <> {chatHistory.map((message) => ( <div key={`${message.uid}-${message.turn_id}`}> {message.uid}: {message.text} </div> ))} </> ) }
-
Add a Conversational AI agent to the channel
To start a Conversational AI agent, configure the following parameters in your
POST
request:Parameter Description Required advanced_features.enable_rtm: true
Starts the Signaling service Yes parameters.data_channel: "rtm"
Enables Signaling as the data transmission channel Yes parameters.enable_metrics: true
Enables agent performance data collection Optional parameters.enable_error_message: true
Enables reporting of agent error events Optional After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.
-
Unsubscribe from the channel
After each agent session ends, unsubscribe from channel messages to release resources associated with callback events:
conversationalAIAPI.unsubscribeMessage(channel_name)
-
Release resources
At the end of each call, use the
destroy
method to clean up the cache.conversationalAIAPI.destroy()
Reference
This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.
Folder structure
- Android
- iOS
- Web
IConversationalAIAPI.kt
: API interface and related data structures and enumerationsConversationalAIAPIImpl.kt
: ConversationalAI API main implementation logicConversationalAIUtils.kt
: Tool functions and event callback managementsubRender/
v3/
: Subtitle moduleTranscriptionController.kt
: Subtitle ControllerMessageParser.kt
: Message Parser
ConversationalAIAPI.swift
: API interface and related data structures and enumerationsConversationalAIAPIImpl.swift
: ConversationalAI API main implementation logicTranscription/
TranscriptionController.swift
: Subtitle Controller
index.ts
: API Classtype.ts
: API interface and related data structures and enumerationsutils/
index.ts
: API utility functionsevents.ts
: Event management class, which can be extended to easily implement event monitoring and broadcastingsub-render.ts
: Subtitle module
API Reference
This section provides API reference documentation for the subtitles module.