Skip to main content

Record captions

Real-Time STT enables you to record caption text, generate a .vtt file, and store it in the configured cloud storage. These recorded captions are designed for seamless playback alongside audio and video recordings. Additionally, you can use the .vtt file to generate meeting minutes and topic summaries, perform sentiment analysis and content moderation. Caption recording does not incur any additional fees.

This page explains how to record captions and synchronize them with the audio or video files generated by Cloud Recording.

info

Since Cloud Recording and transcription tasks operate on different servers, you must enable the NTP timestamp for Cloud Recording to synchronize the timestamps with the transcription.

Prerequisites

To follow this procedure, you must:

  • Make sure Real-Time STT is enabled for your app.

  • Set the enableNTPtimestamp parameter to true in the body of the Cloud Recording start request to synchronize the timestamps with the transcription.

    Sample Cloud Recording start request body

    _26
    "recordingConfig": {
    _26
    "maxIdleTime": 30,
    _26
    "streamTypes": 2,
    _26
    "channelType": 1,
    _26
    "extensionParams": {
    _26
    "enableNTPtimestamp": true
    _26
    },
    _26
    "transcodingConfig": {
    _26
    "width": 1280,
    _26
    "height": 720,
    _26
    "fps": 15,
    _26
    "bitrate": 2400,
    _26
    "mixedVideoLayout": 3,
    _26
    "layoutConfig": [
    _26
    {
    _26
    "alpha": 1,
    _26
    "height": 1,
    _26
    "render_mode": 1,
    _26
    "uid": "22228",
    _26
    "width": 1,
    _26
    "x_axis": 0,
    _26
    "y_axis": 0
    _26
    }
    _26
    ]
    _26
    }
    _26
    },

Implementation

Follow these steps to record captions and synchronize them with the corresponding recordings for seamless playback.

Record the captions

To record the captions, follow the API call sequence from the REST Quickstart and modify the start request to include caption recording parameters as follows:


_32
curl --location --request POST 'https://api.agora.io/api/speech-to-text/v1/projects/{appId}/join' \
_32
--header 'Content-Type: application/json' \
_32
--header 'Authorization: Basic <credentials>' \
_32
--data '{
_32
"name": "unique-agent-id",
_32
"languages": [
_32
"en-US"
_32
],
_32
"maxIdleTime": 50,
_32
"rtcConfig": {
_32
"channelName": "<YourChannelName>",
_32
"subBotUid": "<YourSubscribeUid>",
_32
"subBotToken": "<YourSubscribeToken>",
_32
"pubBotUid": "<YourPublishUid>",
_32
"pubBotToken": "<PublishToken-IfRequired>"
_32
},
_32
"captionConfig": {
_32
"storage": {
_32
"accessKey": "<YourOssAccessKey>",
_32
"secretKey": "<YourOssSecretKey>",
_32
"bucket": "<YourOssBucketName>",
_32
"vendor": <YourOssVendorCode>,
_32
"region": <YourOssRegionCode>,
_32
"fileNamePrefix": ["folder","sub-folder"]
_32
},
_32
"extensionParams": {
_32
"endpoint": "<YourEndpointUrl>",
_32
"type": "s3",
_32
"provider": "<YourProviderName>"
_32
}
_32
}
_32
}'

info

When captionConfig.storage.vendor is 11, you must provide S3-compatible storage access information in captionConfig.extensionParams.

extensionParams fields

FieldTypeRequiredDescription
endpointstringRequired when vendor is 11The access URL for the S3-compatible storage service. Provide the full URL, for example http://host:9002. If you use a self-signed HTTPS certificate, you must trust it in your runtime environment first, otherwise the connection may fail.
providerstringRecommendedThe name of the S3-compatible storage provider, for example Minio for MinIO. Usually set together with endpoint. The capitalization must match the provider's requirements.
typestringRecommendedThe storage type. For standard S3-compatible storage, set this to s3. Usually used together with storage.vendor = 11.
regionstringNoThe storage region identifier for use by AWS SDK-compatible clients. If set in both storage.region and extensionParams.region, extensionParams.region takes precedence.
tagstringNoThe base value for the object tag applied after upload. Takes effect only when storage.vendor is Tencent Cloud, Alibaba Cloud, or Amazon S3. Does not take effect when vendor is 11.
tagByRulearrayNoAutomatically applies object tags based on filename rules, for example applying different tags to different file types. When both tag and tagByRule are set, tagByRule takes precedence. Takes effect only for Tencent Cloud, Alibaba Cloud, and Amazon S3. Does not take effect when vendor is 11.
ssestringNoThe server-side encryption method. aes256 uses AES-256 encryption; kms uses AWS KMS. Takes effect only for Amazon S3. Does not take effect when vendor is 11.
overwritekeysarray of objectsNoMaps uploaded files to target object names by file extension, controlling the filename used in cloud storage after upload. Not subject to the vendor restrictions that apply to tag fields.

Sync the files

The m3u8+vtt file generated by Real-Time STT and the m3u8+ts file generated by Cloud Recording are two independent files, with different time stamps. The Cloud Recording time stamp starts at 0, while Real-Time STT uses the system time stamp. If either process starts abnormally, the media files generated by the two services may be out of sync during playback.

Agora provides a post-processing script that lets you sync the m3u8+ts and m3u8+vtt files. To run this script, take the following steps:

  1. Unzip the post-processing script to a local folder.

  2. Run the script on your transcription files:


    _1
    python3 insert_subtitle.py --av audio_dir/audio_ts.m3u8 --subtitle subtitle_dir/subtitle.m3u8 --output output_dir/ --overwrite

    If ffmpeg/ffprobe are not in your PATH, use -ffmpeg_path to specify the path.

  3. Play the synchronized files:

    1. Run the following command to start the HTTP server:


      _1
      python3 -m http.server --bind 127.0.0.1 -d output_dir

    2. In your browser, enter the following URL:


      _1
      http://127.0.0.1:8000/player_demo.html

Supported OSS vendors

vendor: Number. The third-party cloud storage vendor. The following are supported:

  • 1: Amazon S3
  • 5: Microsoft Azure
  • 6: Google Cloud
  • 11: Other S3-compatible object storage, such as MinIO and some self-hosted storage services. When using this vendor, set the region in extensionParams.region instead of storage.region.services

region: Number. The region information specified for the third-party cloud storage. The service only supports regions in the following lists:

  • Amazon S3 (vendor = 1):

    • 0: US_EAST_1
    • 1: US_EAST_2
    • 2: US_WEST_1
    • 3: US_WEST_2
    • 4: EU_WEST_1
    • 5: EU_WEST_2
    • 6: EU_WEST_3
    • 7: EU_CENTRAL_1
    • 8: AP_SOUTHEAST_1
    • 9: AP_SOUTHEAST_2
    • 10: AP_NORTHEAST_1
    • 11: AP_NORTHEAST_2
    • 12: SA_EAST_1
    • 13: CA_CENTRAL_1
    • 14: AP_SOUTH_1
    • 15: CN_NORTH_1
    • 16: CN_NORTHWEST_1
    • 17: US_GOV_WEST_1
    • 20: AP_NORTHEAST_3
    • 21: EU_NORTH_1
    • 22: ME_SOUTH_1
    • 23: US_GOV_EAST_1
    • 24: AP_SOUTHEAST_3
    • 25: EU_SOUTH_1
    • 28: IL_CENTRAL_1
  • Microsoft Azure (vendor = 5): The region parameter has no effect, whether set or not.

  • Google Cloud (vendor = 6): The region parameter has no effect, whether set or not.