Skip to main content

Parse transcription data

Agora uses Protocol Buffers (protobuf) to serialize transcription data. Protobuf, developed by Google, is a language-neutral, platform-independent way to serialize structured data. It enables efficient, consistent data handling across platforms by generating source code in multiple programming languages. Learn more at protobuf.dev.

Understand the tech

Agora Real-Time STT provides the SttMessage.proto file that defines the message format for speech-to-text conversion. This format serializes transcribed text data into an efficient transmission format, such as binary or JSON, for transmission through the data stream. This guide explains how to generate target language code using the Protobuf Compiler protoc, deserialize the received data stream, and extract specific text fields from the deserialized data structure.

Prerequisites

To follow this procedure, you must:

  • Enable Real-Time STT for your app.

  • Install the Protobuf compiler to generate code classes that process transcription text.

    info

    Since the format of Protobuf may vary across versions, best practice is to ensure that the Protobuf SDK versions used for generated code and client deserialization are consistent.

Parse transcription data using Protobuf

Follow these steps to write a script that calls the protoc compiler to generate code in different languages.

Create a Protobuf definition file

Protobuf allows you to generate source code in your preferred language based on the structure defined in the .proto file. Agora provides the following Protobuf definition for parsing data. To use the file for generating code:

  1. Copy the following Protobuf definition to a local SttMessage.proto file:


    _37
    syntax = "proto3";
    _37
    _37
    package Agora.SpeechToText;
    _37
    option objc_class_prefix = "Stt";
    _37
    option csharp_namespace = "AgoraSTTSample.Protobuf";
    _37
    option java_package = "io.agora.rtc.speech2text";
    _37
    option java_outer_classname = "AgoraSpeech2TextProtobuffer";
    _37
    _37
    message Text {
    _37
    reserved 1 to 3, 5, 7 to 9, 11, 17;
    _37
    int64 uid = 4;
    _37
    int64 time = 6;
    _37
    repeated Word words = 10;
    _37
    int32 duration_ms = 12;
    _37
    string data_type = 13;
    _37
    repeated Translation trans = 14;
    _37
    string culture = 15;
    _37
    int64 text_ts = 16;
    _37
    OriginalTranscript original_transcript = 18;
    _37
    }
    _37
    _37
    message Word {
    _37
    reserved 2, 3, 5;
    _37
    string text = 1;
    _37
    bool is_final = 4;
    _37
    }
    _37
    _37
    message Translation {
    _37
    bool is_final = 1;
    _37
    string lang = 2;
    _37
    repeated string texts = 3;
    _37
    }
    _37
    _37
    message OriginalTranscript {
    _37
    string culture = 1;
    _37
    repeated Word words = 2;
    _37
    }

    For a description of each field in the SttMessage.proto file, browse the Reference section.

  2. Edit the following properties in your .proto file to match your project:

    • package: The source code package namespace.
    • option: The desired language options.

Generate source code script

Create a shell script named generate_code.sh with the following content:

#!/bin/sh# Specify the path to the protoc compiler. In the example code# The Protobuf version used is 21.12. You can replace it according to your actual needs.PROTOC_PATH=./protoc-21.12-osx-aarch_64/bin/protoc# Specify the path to the .proto file. # The detailed description of the data structure can be found in the reference section.PROTO_FILE=./SttMessage.proto# Specify the output directory.JAVA_OUT_DIR=$(pwd)/code/java# Create the output directory (if it doesn't exist).mkdir -p $JAVA_OUT_DIR# Generate Java code.$PROTOC_PATH --java_out=$JAVA_OUT_DIR $PROTO_FILE# Output a message once code generation is finished.echo "Code generation completed."

Run the script

To generate a Protobuf class, run these commands in your terminal:


_5
# Make the script executable
_5
chmod +x generate_code.sh
_5
_5
# Run the script
_5
./generate_code.sh

Deserialize transcription data

When transcription text is available, your Video SDK event handler receives the stream message callback. Use the generated Protobuf class to deserialize the received data and convert it back into a data structure or object.

// Join a channel and add callback events rtcManager.joinChannel(roomName, localUid, agora_token, roleType.equals(ROLE_TYPE_BROADCAST), new RtcManager.OnChannelListener() {     ...     // Callback for receiving a stream message     @Override     public void onStreamMessage(int uid, int streamId, byte[] data) {         // Check if the remote user ID matches the specified streaming bot ID.          // If so, decode the stream data into a text object.         if (String.valueOf(uid).equalsIgnoreCase(RTC_UID_STT_STREAM)) {             AgoraSpeech2TextProtobuffer.Text text = STTManager.getInstance().parseTextByte(roomName, data);             // Convert the parsed text object to JSON format and print it to the log             LogUtil.d(originLogName, mGson.toJson(text));         }     }     ... }); public AgoraSpeech2TextProtobuffer.Text parseTextByte(String channel, byte[] data) {     // Declare a variable of type AgoraSpeech2TextProtobuffer.Text to store the deserialized object     AgoraSpeech2TextProtobuffer.Text textStream;     try {         // Deserialize the byte array data into an AgoraSpeech2TextProtobuffer.Text object         textStream = AgoraSpeech2TextProtobuffer.Text.parseFrom(data);     } catch (Exception ex) {         notifyErrorHandler(new ErrorInfo("parseTextByte", "-1", "parseTextByte parseFrom error >> " + ex.toString()));         return null;     }     ... }

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

Sample project

Agora provides an open-source Agora-RTT-Demo sample project. You can download it or view its source code.

SttMessage.proto fields

The following tables describe the fields in the SttMessage.proto file.

Text message fields

Field nameTypeDescription
uidint64The user ID associated with the text.
timeint64The start time of the transcription segment. Has a value only when isFinal is true; otherwise, the value is 0.
wordsrepeatedAn array of transcription results. See WordMessage type for details.
duration_msint32The duration of the transcribed text in milliseconds.
data_typestringThe type of data:
  • transcribe: Transcription
  • translate: Text translation
transrepeatedAn array of translation results. See TranslationMessage type for details.
culturestringThe source language of the transcription.
text_tsint64The continuously incremented timestamp of the transcription result, used to align source and target text during real-time translation.
original_transcriptOriginalTranscriptThe transcribed text used for translation.

Word message fields

Field NameTypeDescription
textstringThe transcription result.
is_finalboolIndicates whether this sentence is the final transcription result.
  • true: The transcription engine has determined the result for this sentence, and no further modifications are expected. This does not mean the sentence is semantically complete.
  • false: The result is not yet final and may change.

Translation message fields

Field NameTypeMeaning
is_finalboolIndicates whether this sentence is the final translation result.
  • true: The translation engine has determined that the translation result is final and no further modification is required. This does not mean the sentence is semantically complete.
  • false: The translation result is not yet final and may be updated.
langstringThe target language of the translation.
textsrepeatedThe translated text results.

Original transcript message fields

Field NameTypeMeaning
culturestringThe source language of the transcription.
wordsrepeatedAn array of transcription results.

Translate and transcribe examples

This section shows sample output from the STT service for transcribed and translated sentences.

Transcribe

time: 1753359518654 words {   text: "Hello, how are you?"   is_final: true } duration_ms: 770 data_type: "transcribe" culture: "en-US" text_ts: 1753359520754

Translate

time: 1753359518654duration_ms: 770data_type: "translate"trans {  is_final: true  lang: "es-ES"  texts: "Hola, ¿cómo estás? "}text_ts: 1753359520754original_transcript {  culture: "en-US"  words {    text: "Hello, how are you?"    is_final: true  }}