Start a Real-time STT task

POST

https://api.agora.io/v1/projects/{appId}/rtsc/speech-to-text/tasks

After you acquire a builderToken, call this method within 5 minutes to start speech-to-text conversion.

Request

Path parameters

appId stringrequired

The App ID of the project

Query parameters

builderToken stringrequired

The tokenName value you obtained in the response body of the acquire method. To stop a task, use the same builderToken you used to start the task.

Request body

APPLICATION/JSON

BODYrequired

languages array[string]required
The transcription languages to recognize. You can specify a maximum of 4 languages. Refer to Supported Languages for details.
uidLanguagesConfig objectnullable
Configure the transcription language for the specified user ID. Supports up to 5 configuration items.
Show propertiesHide properties
- uid stringrequired
  The ID of the user to be transcribed.
- languages stringrequired
  You can specify up to 4 languages. Refer to Supported Languages for details.
maxIdleTime integernullable
Default: 30
Possible values: 5 to 2592000
Maximum channel idle time, in seconds. When the specified time is exceeded, the transcription task ends automatically. Idle time means that there is no host in a live broadcast channel, or there is no user in a communication channel.
rtcConfig objectrequired
Show propertiesHide properties
- channelName stringrequired
  The name of the channel to transcribe.
- subBotUid stringrequired
  The ID of the bot that subscribes to the audio stream.
- subBotToken stringnullable
  The token used by the subscribing bot for channel authentication. Required only when your project has App Certificate enabled. Generate this token on your token server. For details, see Token authentication.
- pubBotUid stringrequired
  The ID of the bot that pushes subtitle information to the channel.
- pubBotToken stringnullable
  The token used by the subtitle-pushing bot for channel authentication. Required only when your project has App Certificate enabled. Generate this token on your token server. For details, see Token authentication.
- subscribeAudioUids array[string]nullable
  The user IDs of the audio streams you want to subscribe to. Specify this parameter only if you need to subscribe to specific users. Maximum array length: 3.
- cryptionMode integernullable
  Possible values: 0 to 8
  The encryption and decryption mode. When enabled, this mode is used for both decrypting incoming streams and encrypting outgoing subtitles.
  
  0: No encryption
  
  1: AES_128_XTS 128-bit AES encryption, XTS mode
  
  2: AES_128_ECB 128-bit AES encryption, ECB mode
  
  3: AES_256_XTS 256-bit AES encryption, XTS mode
  
  4: SM4_128_ECB 128-bit SM4 encryption, ECB mode
  
  5: AES_128_GCM 128-bit AES encryption, GCM mode
  
  6: AES_256_GCM 256-bit AES encryption, GCM mode
  
  7: AES_128_GCM2 128-bit AES encryption, GCM mode, Compared with AES_128_GCM encryption mode, this encryption mode is more secure and requires setting a key and salt.
  
  8: AES_256_GCM2 256-bit AES encryption, GCM mode, Compared with AES_256_GCM encryption mode, this encryption mode is more secure and requires setting a key and salt. The decryption method must match the encryption method set for the channel.
- salt stringnullable
  A Base64-encoded, 32-byte encryption/decryption salt. Required only when cryptionMode is 7 or 8.
- secret stringnullable
  The encryption/decryption key. Required when cryptionMode is not 0.
- enableJsonProtocol booleannullable
  Default: false
  Set the encoding format of the subtitle data pushed to the channel.
  
  true: Use JSON to push subtitles and compress data with gzip. Uses less bandwidth, but requires decoding.
  
  false: Use Protobuf to push subtitles (default). The data volume is smaller. Suitable for scenarios with high transmission efficiency requirements.
- subscribeAudioUids array[string]nullable
  The user IDs for the audio streams you want to subscribe. To subscribe to audio streams from all users, set this parameter to ["all"]. Maximum array length: 32. You can set either subscribeAudioUids or unSubscribeAudioUids.
- unSubscribeAudioUids array[string]nullable
  The user IDs for the audio streams you do not want to subscribe. Maximum array length: 5. You can set either subscribeAudioUids or unSubscribeAudioUids.
translateConfig objectnullable
Subtitle translation configuration.
Show propertiesHide properties
- languages arrayrequired
  The translation language array. You can specify a maximum of 4 different source languages. The source language and target language must be different, otherwise an error is reported.
  Each array item is an object with:
  Show propertiesHide properties
  source stringrequired
  The source language for translation. Refer to Supported Languages for details.
  target array[string]required
  The target languages for translation. You can specify a maximum of 5 target languages for each source language. Refer to Supported Languages for details.
captionConfig objectnullable
Subtitle recording configuration.
Show propertiesHide properties
- sliceDuration integernullable
  Default: 60
  Possible values: 5 to 28800
  The slice size of the recorded subtitle file, in seconds.
- storage objectrequired
  Show propertiesHide properties
  accesskey stringrequired
  The access key of the third-party cloud storage.
  secretkey stringrequired
  The secret key of the third-party cloud storage.
  bucket stringrequired
  The bucket name of the third-party cloud storage.
  vendor integerrequired
  Possible values: 1, 5, 6
  The third-party cloud storage platform:
  
  1: Amazon S3
  
  5: Microsoft Azure
  
  6: Google Cloud
  
  region integerrequired
  The region information for the third-party cloud storage. To ensure successful and real-time uploading of recorded files, the cloud storage region must match the region of the application server where you initiate the request. For example, if your App server is in East US, set the cloud storage region to East US as well. See third-party storage regions for details.
  fileNamePrefix array[string]nullable
  The storage location of the recorded file in the third-party cloud storage. The prefix length (including slashes) must not exceed 128 characters. The following characters are supported:
  
  Lowercase English letters (a-z)
  
  Uppercase English letters (A-Z)
  
  Numbers (0-9)
  Symbols like slashes, underscores, and brackets must not appear in the string.

Response

If the returned status code is 200, the request was successful. The response body contains the result of the request.
OK
- taskId stringrequired
  The unique identifier of this transcription task.
- createTs integerrequired
  The Unix timestamp (in seconds) when the transcription task was created.
- status stringrequired
  The current status of the transcription task:
  
  IDLE: Task not initialized
  
  PREPARING: Task has received an initialization request
  
  PREPARED: Task initialization completed
  
  STARTING: Task is beginning to start
  
  CREATED: Task startup partially completed
  
  STARTED: Task startup fully completed
  
  IN_PROGRESS: Task is currently running
  
  STOPPING: Task is in the process of being paused
  
  STOPPED: Task has been terminated
  
  FAILURE_STOP: Task termination failed
If the returned status code is not 200, the request failed. Refer to the message field to understand the possible reasons for failure.
Non-200
- message string
  The reason why the request failed.

Authorization

This endpoint requires Basic Auth.

Request example

curl
Python
Node.js

curl --request POST \
  --url 'https://api.agora.io/v1/projects/:appId/rtsc/speech-to-text/tasks?builderToken=your_builder_token' \
  --header 'Authorization: Basic <credentials>' \
  --header 'Content-Type: <string>' \
  --data '
{
  "languages": [
    "en-US"
  ],
  "maxIdleTime": 50,
  "rtcConfig": {
    "channelName": "agora-test",
    "subBotUid": "47091",
    "pubBotUid": "88222",
    "subscribeAudioUids": [
      "45321",
      "23433"
    ]
  },
  "translateConfig": {
    "languages": [
      {
        "source": "en-US",
        "target": [
          "ar-SA",
          "id-ID",
          "fr-FR",
          "ja-JP"
        ]
      }
    ]
  },
  "captionConfig": {
    "sliceDuration": 60,
    "storage": {
      "accessKey": "test-oss",
      "secretKey": "test-oss",
      "bucket": "test-oss",
      "vendor": 2,
      "region": 3
    }
  }
}'

Response example

200
Non-200

{
  "taskId": "XXXX",
  "createTs": 1678505852,
  "status": "IN_PROGRESS"
}

Start a Real-time STT task

Request​

Path parameters​

Query parameters​

Request body​

Response​