Display live transcripts

When interacting with conversational AI in real time, you can enable real-time transcripts to display the conversation content. This page explains how to implement real-time transcripts in your app.

Understand the tech

Agora provides a flexible, scalable, and standardized conversational AI engine toolkit. The toolkit supports iOS, Android, and Web platforms, and encapsulates scenario-based APIs. You can use these APIs to integrate the capabilities of the Agora Signaling SDK and Video SDK to enable the following features:

Interrupt agents
Display live transcripts
Receive event notifications
Set optimal audio parameters (iOS and Android only)
Send picture messages

The toolkit receives transcript content through the onTranscriptUpdated callback and supports monitoring the following types of transcript data:

Agent transcript: Transcribes the agent’s speech. Includes real-time updates and final results.
User transcript: Transcribes the user’s speech. Supports real-time display and status management.
Transcript status: Reports status updates such as in progress, completed, or interrupted.

The following diagram outlines the step-by-step process to integrate live transcript functionality into your application:

Transcript rendering workflow

Prerequisites

Before you begin, ensure the following:

You have implemented the Conversational AI Engine REST quickstart.
Your app integrates Video SDK v4.5.1 or later and includes the video quickstart implementation.
You have enabled Signaling in the Agora Console and completed Signaling quickstart for basic messaging.
You maintain active and authenticated RTC and Signaling instances that persist beyond the component's lifecycle. The toolkit does not manage the initialization, lifecycle, or authentication of RTC or Signaling.

Implementation

This section describes how to receive transcript content from the transcript processing module and display it on your app UI.

Android
iOS
Web

Integrate the toolkit

Copy the convoaiApi folder to your project and import the toolkit before calling the toolkit API. Refer to Folder structure to understand the role of each file.

Create a toolkit instance

Create a configuration object with the Video SDK and Signaling engine instances. Set the transcript rendering mode, then use the configuration to create a toolkit instance.

// Create configuration objects for the RTC and RTM instances val config = ConversationalAIAPIConfig(     rtcEngine = rtcEngineInstance,     rtmClient = rtmClientInstance,     // Set the transcript rendering mode. Options:     // - TranscriptRenderMode.Word: Renders transcript word by word.     // - TranscriptRenderMode.Text: Renders the full sentence at once.          renderMode = TranscriptRenderMode.Word,     enableLog = true ) // Create component instance val api = ConversationalAIAPIImpl(config)

Subscribe to the channel

Transcript data is delivered through Signaling channel messages. To receive transcript data, call subscribeMessage before starting the agent session.
```
api.subscribeMessage("channelName") { error ->     if (error != null) {         // Handle error     } }
```
Receive transcript

Call the addHandler method to register your implementation of the transcription callback.
```
api.addHandler(covEventHandler)
```

Implement UI rendering logic

Inherit your UI module from the IConversationalAIAPIEventHandler interface. Implement the onTranscriptUpdated method to handle the logic for rendering transcript to the UI.

 private val covEventHandler = object : IConversationalAIAPIEventHandler { override fun onTranscriptUpdated(agentUserId: String, transcription: Transcription) {         // Handle transcript data and update the UI here     }       }

Add a Conversational AI agent to the channel

To start a Conversational AI agent, configure the following parameters in your POST request:

Parameter	Description	Required
`advanced_features.enable_rtm: true`	Starts the Signaling service	Yes
`parameters.data_channel: "rtm"`	Enables Signaling as the data transmission channel	Yes
`parameters.enable_metrics: true`	Enables agent performance data collection	Optional
`parameters.enable_error_message: true`	Enables reporting of agent error events	Optional

After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.

Unsubscribe from the channel

After an agent session ends, unsubscribe from channel messages to release transcription resources:

api.unsubscribeMessage("channelName") { error ->
     if (error != null) {
         // Handle the error
     }
 }

Release resources

At the end of each call, use the destroy method to clean up the cache.
```
api.destroy()
```

Integrate the toolkit

Copy the ConversationalAIAPI folder to your project and import the toolkit before calling the toolkit APIs. Refer to Folder structure to understand the role of each file.

Create a toolkit instance

Create a configuration object with the Video SDK and Signaling engine instances. Set the transcript rendering mode, then use the configuration to create a toolkit instance.

// Create a configuration object for the RTC and RTM instances let config = ConversationalAIAPIConfig(     rtcEngine: rtcEngine,      rtmEngine: rtmEngine,     /**     * Set the transcript rendering mode. Available options:     * - .words: Word-by-word rendering mode. The transcript content received from the callback      *   is rendered to the UI one word at a time.     * - .text: Sentence-by-sentence rendering mode. The full transcript content from the callback      *   is rendered to the UI at once.     */     renderMode: .words,      enableLog: true ) // Create the component instance convoAIAPI = ConversationalAIAPIImpl(config: config)

Subscribe to the channel

Transcript data is delivered through Signaling channel messages. To receive transcript data, call subscribeMessage before starting the agent session.

convoAIAPI.subscribeMessage(channelName: channelName) { error in     if let error = error {         print("Subscription failed: \(error.message)")     } else {         print("Subscription successful")     } }

Receive transcript

Call the addHandler method to register and implement the transcript callback:
```
convoAIAPI.addHandler(handler: self)
```

Implement UI rendering logic

Implement the ConversationalAIAPIEventHandler protocol in your UI module, and use the onTranscriptUpdated method to handle and render transcript to the UI.

extension ChatViewController: ConversationalAIAPIEventHandler {     public func onTranscriptUpdated(agentUserId: String, transcript: Transcript) {         // Handle transcript data and update the UI here     } }

Add a Conversational AI agent to the channel

To start a Conversational AI agent, configure the following parameters in your POST request:

Parameter	Description	Required
`advanced_features.enable_rtm: true`	Starts the Signaling service	Yes
`parameters.data_channel: "rtm"`	Enables Signaling as the data transmission channel	Yes
`parameters.enable_metrics: true`	Enables agent performance data collection	Optional
`parameters.enable_error_message: true`	Enables reporting of agent error events	Optional

After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.

Unsubscribe from the channel

After each agent session ends, unsubscribe from channel messages to release transcript-related resources.

// Unsubscribe from channel messages convoAIAPI.unsubscribeMessage(channelName: channelName) { error in     if let error = error {         print("Unsubscription failed: \(error.message)")     } else {         print("Unsubscribed successfully")     } }

Release resources

At the end of each call, use the destroy method to clean up the cache.
```
convoAIAPI.destroy()
```

Integrate the toolkit

Copy the conversational-ai-api file to your project and import the toolkit before calling its API. Refer to Folder structure to understand the role of each file.

Create a toolkit instance

Before joining a Video SDK channel, create video and Signaling engine instances and pass in to the toolkit instance.

// Initialize the component ConversationalAIAPI.init({   rtcEngine,   rtmEngine,   /**   * Set the rendering mode for transcript. Available options:   * - ESubtitleHelperMode.WORD: Word-by-word rendering mode. The transcript content received from the callback    *   is rendered to the UI one word at a time.   * - ESubtitleHelperMode.TEXT: Sentence-by-sentence rendering mode. The full transcript content from the callback    *   is rendered to the UI at once.   *   * If not specified, the mode is determined automatically based on the message, or it can be set manually.   */   renderMode: ESubtitleHelperMode.WORD,  }) // Get the API instance (singleton) const conversationalAIAPI = ConversationalAIAPI.getInstance()

Set audio parameters

In word-by-word rendering mode (ESubtitleHelperMode.WORD), it is necessary to receive audio timestamp metadata from RTC for synchronizing subtitle output with speech. Before creating the client object, configure the following parameter:
```
// Enable sending audio timestamp metadata AgoraRTC.setParameter("ENABLE_AUDIO_PTS_METADATA", true); // Create client with specified channel profile and codec const client = AgoraRTC.createClient(     { mode: "rtc", codec: "vp8" } );
```
Subscribe to the channel

Agent-related events are delivered through Signaling messages. Before starting an agent session, call subscribeMessage to receive these events:
```
conversationalAIAPI.subscribeMessage(channel_name)
```

Receive transcript

import * as React from "react" import {     type IUserTranscription,     type IAgentTranscription,     type ISubtitleHelperItem,     EConversationalAIAPIEvents, } from "@/conversational-ai-api/type" import { ConversationalAIAPI } from "@/conversational-ai-api" // Listen for transcription content updates to display the content in real time export const ChatHistory = () => { const [chatHistory, setChatHistory] = React.useState     ISubtitleHelperItem<Partial<IUserTranscription | IAgentTranscription>>[] >([]) const conversationalAIAPI = ConversationalAIAPI.getInstance() conversationalAIAPI.on(     EConversationalAIAPIEvents.TRANSCRIPT_UPDATED,     setChatHistory ) return (     <>     {chatHistory.map((message) => (         <div key={`${message.uid}-${message.turn_id}`}>         {message.uid}: {message.text}         </div>     ))}     </> ) }

Add a Conversational AI agent to the channel

To start a Conversational AI agent, configure the following parameters in your POST request:

Parameter	Description	Required
`advanced_features.enable_rtm: true`	Starts the Signaling service	Yes
`parameters.data_channel: "rtm"`	Enables Signaling as the data transmission channel	Yes
`parameters.enable_metrics: true`	Enables agent performance data collection	Optional
`parameters.enable_error_message: true`	Enables reporting of agent error events	Optional

After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.

Unsubscribe from the channel

After each agent session ends, unsubscribe from channel messages to release resources associated with callback events:
```
conversationalAIAPI.unsubscribeMessage(channel_name)
```
Release resources

At the end of each call, use the destroy method to clean up the cache.
```
conversationalAIAPI.destroy()
```

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

Folder structure

Android
iOS
Web

IConversationalAIAPI.kt: API interface and related data structures and enumerations
ConversationalAIAPIImpl.kt: ConversationalAI API main implementation logic
ConversationalAIUtils.kt: Tool functions and event callback management
subRender/
- v3/: Transcription module
  - TranscriptionController.kt: Transcription Controller
  - MessageParser.kt: Message Parser

ConversationalAIAPI.swift: API interface and related data structures and enumerations
ConversationalAIAPIImpl.swift: ConversationalAI API main implementation logic
Transcription/
- TranscriptionController.swift: Transcription Controller

index.ts: API Class
type.ts: API interface and related data structures and enumerations
utils/
- index.ts: API utility functions
- events.ts: Event management class, which can be extended to easily implement event monitoring and broadcasting
- sub-render.ts: Transcription module

API Reference

This section provides API reference documentation for the transcript module.

Android
iOS
Web

Understand the tech​

Prerequisites​

Implementation​

Reference​

Folder structure​

API Reference​

Understand the tech

Prerequisites

Implementation

Reference

Folder structure

API Reference