Display live subtitles
When interacting with conversational AI in real time, you can enable real-time subtitles to display the conversation content. This page explains how to implement real-time subtitles in your app.
Understand the tech
To simplify subtitle integration, Agora provides an open-source subtitle processing module. By integrating this module into your project and calling its APIs, you can quickly enable real-time subtitles. The following figure illustrates how the subtitle module interacts with your app and Agora SD-RTN™.
Subtitles module workflow
Prerequisites
Before you begin, make sure you have implemented the Conversational AI Engine REST quickstart.
Implementation
This section describes how to receive subtitle content from the subtitle processing module and display it on your app UI.
- Android
- iOS/macOS
- Web
Copy ConverationSubtitleController.kt
and MessageParser.kt
files to your project and import the module before calling the module API.
-
Integrate the subtitle processing module
Inherit your subtitle UI module from the
IConversationSubtitleCallback
interface and implement theonSubtitleUpdated
method to handle the message rendering logic. -
Implement subtitle UI rendering logic
class CovMessageListView @JvmOverloads constructor( context: Context, attrs: AttributeSet? = null, defStyleAttr: Int = 0) : LinearLayout(context, attrs, defStyleAttr), IConversationSubtitleCallback { override fun onSubtitleUpdated(subtitle: SubtitleMessage) { // Implement your UI rendering logic here }}
-
Create a subtitle processing module instance
When entering the call page, create an
ConversationSubtitleController
instance, which monitors the subtitle message callback internally and passes the subtitle information to your UI through theonSubtitleUpdated
callback ofIConversationSubtitleCallback
.override fun onCreate(savedInstanceState: Bundle?) { val subRenderController = ConversationSubtitleController( SubtitleRenderConfig( rtcEngine = rtcEngine, SubtitleRenderMode.word, mBinding?.messageListView ) )}
-
Release resources
Call the
reset
method at the end of each call to clean up the cache. When leaving the call page, callrelease
to release resources.subRenderController.reset()subRenderController.release()
-
Integrate the subtitle processing module
Copy
ConversationSubtitleController.swift
andMessageParser.swift
files to your project and import the module before calling the module API. -
Implement subtitle UI rendering logic
To render subtitles in your UI, implement the
ConversationSubtitleDelegate
protocol in your subtitle UI module. Then, define theonSubtitleUpdated
method to handle subtitle message rendering.extension ChatViewController: ConversationSubtitleDelegate { func onSubtitleUpdated(subtitle: SubtitleMessage) { // Implement your UI rendering logic here }}
-
Create a subtitle processing module instance
When entering the call page, create a
ConversationSubtitleController
instance. This instance monitors subtitle message callbacks internally and passes the subtitle information to your UI using theonSubtitleUpdated
callback ofConversationSubtitleDelegate
.let subRenderConfig = SubtitleRenderConfig(rtcEngine: rtcEngine, renderMode: .words, delegate: self)subRenderController.setupWithConfig(subRenderConfig)
-
Release resources
At the end of each call, use the
reset
method to clean up the cache.subRenderController.reset()
- Integrate the subtitle processing module
Copy the message.ts
file to your project and import the module before calling its API. The required dependencies are available in the lib
folder.
-
Implement subtitle UI rendering logic
The subtitle UI module you implement processes the
MessageEngine
subtitle message list and includes a simple component to display these messages:const ChatHistory = () => { const [chatHistory, setChatHistory] = useState<IMessageListItem[]>([]); useEffect(() => { const getChatHistoryFromEvent = (event: MessageEvent) => { const { data } = event; if (data.type === "message") { setChatHistory(data?.data?.chatHistory || []); } }; window.addEventListener("message", getChatHistoryFromEvent); return () => { window.removeEventListener("message", getChatHistoryFromEvent); }; }, []); return ( <> {chatHistory.map((message, index) => ( <div key={`${message.uid}-${message.turn_id}`}> {message.uid}: {message.text} </div> ))} </> );};
infoThe sample code uses
window.addEventListener("message")
to listen for subtitle data sent byMessageEngine
usingwindow.postMessage
. For complex applications, Agora recommends using Redux or other state management tools to manage these messages more efficiently. -
Create a subtitle processing module instance
Before joining an RTC channel, create a
MessageEngine
instance and pass in theAgoraRTC
client, mode, and callback function.import AgoraRTC, { IAgoraRTCClient } from "agora-rtc-sdk-ng";class RtcEngine { private client: IAgoraRTCClient; private messageEngine: MessageEngine | null = null; constructor() { // Create an AgoraRTC client with RTC mode and VP8 codec this.client = AgoraRTC.createClient({ mode: "rtc", codec: "vp8" }); } public joinChannel() { // Create a MessageEngine instance, passing in the AgoraRTC client, mode, and callback function this.messageEngine = new MessageEngine( this.client, EMessageEngineMode.AUTO, (chatHistory) => { // Log chatHistory to the console console.log("chatHistory", chatHistory); // Send chatHistory to the web page; using Redux or other state management tools is recommended // Here, window.postMessage is used as an example window.postMessage({ type: "message", chatHistory, }); } ); this.client.join("***", "****", "****", "****"); }}
-
Release resources
When leaving the call page or ending the conversation, call the
cleanup
method to release resources.this.messageEngine.clearup()
Reference
This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.
API Reference
This section provides API reference documentation for the subtitles module.
- Android
- iOS/macOS
- Web
ConversationSubtitleController
class ConversationSubtitleController ( private val config: SubtitleRenderConfig)
config
: Subtitle rendering configuration. SeeSubtitleRenderConfig
for details.
SubtitleRenderConfig
data class SubtitleRenderConfig ( val rtcEngine: RtcEngine, val renderMode: SubtitleRenderMode?, val callback: IConversationSubtitleCallback?)
rtcEngine
:AgoraRtcEngine
instance.renderMode
: Subtitle rendering mode, seeSubtitleRenderMode
for details.callback
: The callback interface for receiving subtitle content updates, seeIConversationSubtitleCallback
for details.
SubtitleRenderMode
enum class SubtitleRenderMode { Text, Word}
Text
: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.Word
: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UI.
Using the word-by-word rendering mode (Word
) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (Text
).
IConversationSubtitleCallback
The callback interface for subtitle content update events.
interface IConversationSubtitleCallback { fun onSubtitleUpdated(subtitle: SubtitleMessage)}
onSubtitleUpdated
: Subtitle update callback.subtitle
: Updated subtitle message, see for detailsSubtitleMessage
.
SubtitleMessage
data class SubtitleMessage( val turnId: Long, val userId: Int, val text: String, var status: SubtitleStatus)
-
turnId
: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to oneturnId
, and follows the following rules:turnId = 0
, This is the welcome message of the agent, and there is no subtitle for the user.turnId ≥ 1
, The subtitles for the user or agent in that round. Use theuserId
to display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
cautionThere is no guarantee that callbacks will be in strictly increasing
turnId
order. If you encounter out-of-order situations, implement the sorting logic yourself. -
userId
: The user ID associated with this subtitle message. In the current version,0
represents the user, non-zero represents the agent ID. -
text
: Subtitle text content. -
status
: The current status of the subtitles. SeeSubtitleStatus
for details.
SubtitleStatus
Use SubtitleStatus
for special UI processing based on the status, such as displaying an interruption mark at the end of the subtitle.
enum class SubtitleStatus { Progress, End, Interrupted}
Progress
: The subtitles are still being generated; the user or agent has not finished speaking.End
: The subtitle generation is complete; the user or agent has finished speaking.Interrupted
: The subtitles were interrupted before completion; the user actively stopped the agent’s response.
ConversationSubtitleController
class ConversationSubtitleController { func setupWithConfig(_ config: SubtitleRenderConfig) func reset()}
setupWithConfig(_ config:)
: Set subtitle rendering configuration.config
: Subtitle rendering configuration. SeeSubtitleRenderConfig
for details.
reset()
: Clear the cache.
SubtitleRenderConfig
struct SubtitleRenderConfig { let rtcEngine: AgoraRtcEngineKit let renderMode: SubtitleRenderMode let delegate: ConversationSubtitleDelegate?}
rtcEngine
:AgoraAgoraRtcEngineKit
instance.renderMode
: Subtitle rendering mode. SeeSubtitleRenderMode
for details.delegate
: Callback protocol for receiving subtitle content update events. SeeConversationSubtitleDelegate
for details.
SubtitleRenderMode
enum SubtitleRenderMode { case words case text}
words
: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UItext
: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.
Using the word-by-word rendering mode (words
) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (text
).
ConversationSubtitleDelegate
Callback protocol for subtitle content update events.
protocol ConversationSubtitleDelegate: AnyObject { func onSubtitleUpdated(subtitle: SubtitleMessage)}
ConversationSubtitleDelegate
: Subtitle update callback protocol.onSubtitleUpdated
: Subtitle update callback.subtitle
: Updated subtitle message. SeeSubtitleMessage
for details.
SubtitleMessage
struct SubtitleMessage { let turnId: Int let userId: UInt let text: String var status: SubtitleStatus}
-
turnId
: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to oneturnId
, and follows the following rules:turnId = 0
, This is the welcome message of the agent, and there is no subtitle for the user.turnId ≥ 1
, The subtitles for the user or agent in that round. Use theuserId
to display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
cautionThere is no guarantee that callbacks will be in strictly increasing
turnId
order. If you encounter out-of-order situations, implement the sorting logic yourself. -
userId
: The user ID associated with this subtitle message. In the current version,0
represents the user, non-zero represents the agent ID. -
text
: Subtitle text content. -
status
: The current status of the subtitles. SeeSubtitleStatus
for details.
SubtitleStatus
enum SubtitleStatus: Int { case inprogress = 0 case end = 1 case interrupt = 2}
inprogress
: The subtitles are still being generated; the user or agent has not finished speaking.end
: The subtitle generation is complete; the user or agent has finished speaking.interrupted
: The subtitles were interrupted before completion; the user actively stopped the agent’s response.
MessageEngine
Subtitle processing engine.
class MessageEngine ( rtcEngine: rtcEngine, renderMode?: EMessageEngineMode, callback?: (messageList: IMessageListItem[]) => void)
rtcEngine
: Agora RTC engine instance.renderMode
: Subtitle rendering mode, SeeEMessageEngineMode
for details. Default isEMessageEngineMode.AUTO
.callback
: Callback function for receiving subtitle content updates.IMessageListItem[]
is a list of messages. SeeIMessageListItem
for details.
EMessageEngineMode
enum EMessageEngineMode { TEXT = 'text', WORD = 'word', AUTO = 'auto',}
TEXT
: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.WORD
: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UI.AUTO
: Automatic mode. The rendering mode is automatically selected according to the mode supported by the TTS provider.
Using the word-by-word rendering mode (WORD
) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (TEXT
).
IMessageListItem
interface IMessageListItem { uid: number turn_id: number text: string status: EMessageStatus}
-
uid
: The user ID associated with this subtitle message. In the current version,0
represents the user, non-zero represents the agent ID. -
turn_id
: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to oneturn_id
, and follows the following rules:turn_id = 0
, This is the welcome message of the agent, and there is no subtitle for the user.turn_id ≥ 1
, The subtitles for the user or agent in that round. Use theuid
to display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
cautionThere is no guarantee that callbacks will be in strictly increasing
turn_id
order. If you encounter out-of-order situations, implement the sorting logic yourself. -
text
: Subtitle text content. -
status
: The current status of the subtitles. SeeEMessageStatus
for details.
EMessageStatus
enum EMessageStatus { IN_PROGRESS = 0, END = 1, INTERRUPTED = 2,}
IN_PROGRESS
: The subtitles are still being generated; the user or agent has not finished speaking.END
: The subtitle generation is complete; the user or agent has finished speaking.INTERRUPTED
: The subtitles were interrupted before completion; the user actively stopped the agent’s response.