Skip to main content

Start a Real-time STT task

Start a Real-time STT task

POST
https://api.agora.io/v1/projects/{appId}/rtsc/speech-to-text/tasks

After you acquire a builderToken, call this method within 5 minutes to start speech-to-text conversion.

Request

Path parameters

appId stringrequired

The App ID of the project

Query parameters

builderToken stringrequired

The tokenName value you obtained in the response body of the acquire method.

Request body

APPLICATION/JSON
BODYrequired
  • languages array[string]required

    The transcription languages to recognize. You can specify a maximum of 4 languages. Refer to Supported Languages for details.

  • uidLanguagesConfig objectnullable

    Configure the transcription language for the specified user ID. Supports up to 5 configuration items.

      • uid stringrequired

        The ID of the user to be transcribed.

      • languages stringrequired

        You can specify up to 4 languages. Refer to Supported Languages for details.

  • maxIdleTime integernullable

    Default: 30

    Possible values: 5 to 2592000

    Maximum channel idle time, in seconds. When the specified time is exceeded, the transcription task ends automatically. Idle time means that there is no host in a live broadcast channel, or there is no user in a communication channel.

  • rtcConfig objectrequired
      • channelName stringrequired

        The name of the channel to transcribe.

      • subBotUid stringrequired

        The ID of the bot that subscribes to the audio stream.

      • subBotToken stringnullable

        The token used by the subscribing bot for channel authentication. Required only when your project has App Certificate enabled. Generate this token on your token server. For details, see Token authentication.

      • pubBotUid stringrequired

        The ID of the bot that pushes subtitle information to the channel.

      • pubBotToken stringnullable

        The token used by the subtitle-pushing bot for channel authentication. Required only when your project has App Certificate enabled. Generate this token on your token server. For details, see Token authentication.

      • subscribeAudioUids array[string]nullable

        The user IDs of the audio streams you want to subscribe to. Specify this parameter only if you need to subscribe to specific users. Maximum array length: 3.

      • cryptionMode integernullable

        Possible values: 0 to 8

        The encryption and decryption mode. When enabled, this mode is used for both decrypting incoming streams and encrypting outgoing subtitles.

        • 0: No encryption
        • 1: AES_128_XTS 128-bit AES encryption, XTS mode
        • 2: AES_128_ECB 128-bit AES encryption, ECB mode
        • 3: AES_256_XTS 256-bit AES encryption, XTS mode
        • 4: SM4_128_ECB 128-bit SM4 encryption, ECB mode
        • 5: AES_128_GCM 128-bit AES encryption, GCM mode
        • 6: AES_256_GCM 256-bit AES encryption, GCM mode
        • 7: AES_128_GCM2 128-bit AES encryption, GCM mode, Compared with AES_128_GCM encryption mode, this encryption mode is more secure and requires setting a key and salt.
        • 8: AES_256_GCM2 256-bit AES encryption, GCM mode, Compared with AES_256_GCM encryption mode, this encryption mode is more secure and requires setting a key and salt. The decryption method must match the encryption method set for the channel.

      • salt stringnullable

        A Base64-encoded, 32-byte encryption/decryption salt. Required only when cryptionMode is 7 or 8.

      • secret stringnullable

        The encryption/decryption key. Required when cryptionMode is not 0.

      • enableJsonProtocol booleannullable

        Default: false

        Set the encoding format of the subtitle data pushed to the channel.

        • true: Use JSON to push subtitles and compress data with gzip. Uses less bandwidth, but requires decoding.
        • false: Use Protobuf to push subtitles (default). The data volume is smaller. Suitable for scenarios with high transmission efficiency requirements.

      • subscribeAudioUids array[string]nullable

        The user IDs for the audio streams you want to subscribe. To subscribe to audio streams from all users, set this parameter to ["all"]. Maximum array length: 32. You can set either subscribeAudioUids or unSubscribeAudioUids.

      • unSubscribeAudioUids array[string]nullable

        The user IDs for the audio streams you do not want to subscribe. Maximum array length: 5. You can set either subscribeAudioUids or unSubscribeAudioUids.

  • translateConfig objectnullable

    Subtitle translation configuration.

      • languages arrayrequired

        The translation language array. You can specify a maximum of 4 different source languages. The source language and target language must be different, otherwise an error is reported.
        Each array item is an object with:

          • source stringrequired

            The source language for translation. Refer to Supported Languages for details.

          • target array[string]required

            The target languages for translation. You can specify a maximum of 5 target languages for each source language. Refer to Supported Languages for details.

  • captionConfig objectnullable

    Subtitle recording configuration.

      • sliceDuration integernullable

        Default: 60

        Possible values: 5 to 28800

        The slice size of the recorded subtitle file, in seconds.

      • storage objectrequired
          • accesskey stringrequired

            The access key of the third-party cloud storage.

          • secretkey stringrequired

            The secret key of the third-party cloud storage.

          • bucket stringrequired

            The bucket name of the third-party cloud storage.

          • vendor integerrequired

            Possible values: 1, 5, 6

            The third-party cloud storage platform:

            • 1: Amazon S3
            • 5: Microsoft Azure
            • 6: Google Cloud

          • region integerrequired

            The region information for the third-party cloud storage. To ensure successful and real-time uploading of recorded files, the cloud storage region must match the region of the application server where you initiate the request. For example, if your App server is in East US, set the cloud storage region to East US as well. See third-party storage regions for details.

          • fileNamePrefix array[string]nullable

            The storage location of the recorded file in the third-party cloud storage. The prefix length (including slashes) must not exceed 128 characters. The following characters are supported:

            • Lowercase English letters (a-z)
            • Uppercase English letters (A-Z)
            • Numbers (0-9)
              Symbols like slashes, underscores, and brackets must not appear in the string.

Response

  • If the returned status code is 200, the request was successful. The response body contains the result of the request.

    OK
    • taskId stringrequired

      The unique identifier of this transcription task.

    • createTs integerrequired

      The Unix timestamp (in seconds) when the transcription task was created.

    • status stringrequired

      The current status of the transcription task:

      • IDLE: Task not initialized
      • PREPARING: Task has received an initialization request
      • PREPARED: Task initialization completed
      • STARTING: Task is beginning to start
      • CREATED: Task startup partially completed
      • STARTED: Task startup fully completed
      • IN_PROGRESS: Task is currently running
      • STOPPING: Task is in the process of being paused
      • STOPPED: Task has been terminated
      • FAILURE_STOP: Task termination failed

  • If the returned status code is not 200, the request failed. Refer to the message field to understand the possible reasons for failure.

    Non-200
    • message string

      The reason why the request failed.

Authorization

This endpoint requires Basic Auth.

Request example


_43
curl --request post \
_43
--url 'https://api.agora.io/v1/projects/:appId/rtsc/speech-to-text/tasks?builderToken=' \
_43
--header 'Authorization: Basic <credentials>' \
_43
--header 'Content-Type: <string>' \
_43
--data '
_43
{
_43
"languages": [
_43
"en-US"
_43
],
_43
"maxIdleTime": 50,
_43
"rtcConfig": {
_43
"channelName": "agora-test",
_43
"subBotUid": "47091",
_43
"pubBotUid": "88222",
_43
"subscribeAudioUids": [
_43
"45321",
_43
"23433"
_43
]
_43
},
_43
"translateConfig": {
_43
"languages": [
_43
{
_43
"source": "en-US",
_43
"target": [
_43
"ar-SA",
_43
"id-ID",
_43
"fr-FR",
_43
"ja-JP"
_43
]
_43
}
_43
]
_43
},
_43
"captionConfig": {
_43
"sliceDuration": 60,
_43
"storage": {
_43
"accessKey": "test-oss",
_43
"secretKey": "test-oss",
_43
"bucket": "test-oss",
_43
"vendor": 2,
_43
"region": 3
_43
}
_43
}
_43
}'

Response example


_5
{
_5
"taskId": "XXXX",
_5
"createTs": 1678505852,
_5
"status": "IN_PROGRESS"
_5
}

vundefined