Skip to main content

Display live subtitles

When interacting with conversational AI in real time, you can enable real-time subtitles to display the conversation content. This page explains how to implement real-time subtitles in your app.

Understand the tech

Agora provides a flexible, scalable, and standardized conversational AI engine toolkit. The toolkit supports iOS, Android, and Web platforms, and encapsulates scenario-based APIs. You can use these APIs to integrate the capabilities of the Agora Signaling SDK and Video SDK to enable the following features:

The toolkit receives subtitle transcription content through the onTranscriptionUpdated callback and supports monitoring the following types of subtitle data:

  • Agent captions: Transcribes the agent’s speech. Includes real-time updates and final results.

  • User captions: Transcribes the user’s speech. Supports real-time display and status management.

  • Transcription status: Reports status updates such as in progress, completed, or interrupted.

The following diagram outlines the step-by-step process to integrate live subtitle functionality into your application:

Subtitles rendering workflow

Prerequisites

Before you begin, ensure the following:

  • You have implemented the Conversational AI Engine REST quickstart.
  • Your app integrates Video SDK v4.5.1 or later and includes the video quickstart implementation.
  • You have enabled Signaling in the Agora Console and completed Signaling quickstart for basic messaging.
  • You maintain active and authenticated RTC and Signaling instances that persist beyond the component's lifecycle. The toolkit does not manage the initialization, lifecycle, or authentication of RTC or Signaling.

Implementation

This section describes how to receive subtitle content from the subtitle processing module and display it on your app UI.

  1. Integrate the toolkit

    Copy the convoaiApi folder to your project and import the toolkit before calling the toolkit API. Refer to Folder structure to understand the role of each file.

  2. Create a toolkit instance

    Create a configuration object with the Video SDK and Signaling engine instances. Set the subtitle rendering mode, then use the configuration to create a toolkit instance.

    // Create configuration objects for the RTC and RTM instances val config = ConversationalAIAPIConfig(     rtcEngine = rtcEngineInstance,     rtmClient = rtmClientInstance,     // Set the transcription subtitle rendering mode. Options:     // - TranscriptionRenderMode.Word: Renders subtitles word by word.     // - TranscriptionRenderMode.Text: Renders the full sentence at once.          renderMode = TranscriptionRenderMode.Word,     enableLog = true ) // Create component instance val api = ConversationalAIAPIImpl(config)
  3. Subscribe to the channel

    Subtitles are delivered through Signaling channel messages. To receive subtitle data, call subscribeMessage before starting the agent session.

    api.subscribeMessage("channelName") { error ->     if (error != null) {         // Handle error     } }
  4. Receive subtitles

    Call the addHandler method to register your implementation of the subtitle transcription callback.

    api.addHandler(covEventHandler)
  5. Implement subtitle UI rendering logic

    Inherit your subtitle UI module from the IConversationSubtitleCallback interface. Implement the onTranscriptionUpdated method to handle the logic for rendering subtitles to the UI.

     private val covEventHandler = object : IConversationalAIAPIEventHandler { override fun onTranscriptionUpdated(agentUserId: String, transcription: Transcription) {         // Handle subtitle data and update the UI here     }       }
  6. Add a Conversational AI agent to the channel

    To start a Conversational AI agent, configure the following parameters in your POST request:

    ParameterDescriptionRequired
    advanced_features.enable_rtm: trueStarts the Signaling serviceYes
    parameters.data_channel: "rtm"Enables Signaling as the data transmission channelYes
    parameters.enable_metrics: trueEnables agent performance data collectionOptional
    parameters.enable_error_message: trueEnables reporting of agent error eventsOptional

    After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.

  7. Unsubscribe from the channel

    After an agent session ends, unsubscribe from channel messages to release subtitle-related resources:

    api.unsubscribeMessage("channelName") { error ->
    if (error != null) {
    // Handle the error
    }
    }
  8. Release resources

    At the end of each call, use the destroy method to clean up the cache.

    api.destroy()

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

Folder structure

  • IConversationalAIAPI.kt: API interface and related data structures and enumerations
  • ConversationalAIAPIImpl.kt: ConversationalAI API main implementation logic
  • ConversationalAIUtils.kt: Tool functions and event callback management
  • subRender/
    • v3/: Subtitle module
      • TranscriptionController.kt: Subtitle Controller
      • MessageParser.kt: Message Parser

API Reference

This section provides API reference documentation for the subtitles module.