Start a Real-time STT agent
https://api.agora.io/api/speech-to-text/v1/projects/{appid}/joinUse this method to start subtitle recording and subtitle translation.
Request
Path parameters
The App ID of the project
Request body
BODYrequired
- languages array[string]required
The transcription languages you want to recognize. You can specify up to four languages. For a complete list, see Supported languages. Choosing multiple transcription languages can affect both quality and cost. For best practices, see Optimize transcription quality and cost.
- uidLanguagesConfig arraynullable
Configure the transcription language for the specified user ID. Supports up to 5 configuration items.
Show propertiesHide properties
- uid stringrequired
The ID of the user to be transcribed. You may configure a maximum of 5 uids for language recognition at the uid level.
- languages array[string]required
The transcription languages to recognize. Each uid can support a maximum of 4 languages. Refer to Supported Languages for details.
- maxIdleTime integernullable
Default:
30Possible values:
5 to 2592000Maximum channel idle time, in seconds. When the specified time is exceeded, the transcription task ends automatically. Idle time means that there is no host in a live broadcast channel, or there is no user in a communication channel.
- rtcConfig objectrequired
Real-time subtitle configuration. After a user's voice is converted to text, the information is pushed to the channel as subtitles to match the UI real-time display.
Show propertiesHide properties
- channelName stringrequired
The name of the channel to transcribe.
- subBotUid stringdeprecatednullable
The ID of the bot that subscribes to the audio stream. This is always identical to the value of the
pubBotUid. - subBotToken stringdeprecatednullable
The token used by the subscribing bot for channel authentication. Required only when your project has App Certificate enabled. Generate this token on your token server. For details, see Token authentication.
- pubBotUid stringrequired
The ID of the bot that pushes subtitle information to the channel. All UIDs within a channel must be unique. Ensure no other user or service bot is using this UID in the same channel.
- pubBotToken stringnullable
The token used by the subtitle-pushing bot for channel authentication. Required only when your project has App Certificate enabled. Generate this token on your token server. For details, see Token authentication.
- subscribeAudioUids array[string]nullable
The user IDs for the audio streams you want to subscribe. Set this parameter if you need to subscribe to the audio stream of certain users. Maximum array length: 32. You can set either
subscribeAudioUidsorunSubscribeAudioUids. - unSubscribeAudioUids array[string]nullable
The user IDs for the audio streams you do not want to subscribe. Set this parameter if you don't need to subscribe to the audio stream of certain users. Maximum array length: 5. You can set either
subscribeAudioUidsorunSubscribeAudioUids. - cryptionMode integerdeprecatednullable
Possible values:
0 to 8The encryption and decryption mode. When enabled, this mode is used for both decrypting incoming streams and encrypting outgoing subtitles.
0: No encryption1:AES_128_XTS128-bit AES encryption, XTS mode2:AES_128_ECB128-bit AES encryption, ECB mode3:AES_256_XTS256-bit AES encryption, XTS mode4:SM4_128_ECB128-bit SM4 encryption, ECB mode5:AES_128_GCM128-bit AES encryption, GCM mode6:AES_256_GCM256-bit AES encryption, GCM mode7:AES_128_GCM2128-bit AES encryption, GCM mode, Compared withAES_128_GCMencryption mode, this encryption mode is more secure and requires setting a key and salt.8:AES_256_GCM2256-bit AES encryption, GCM mode, Compared withAES_256_GCMencryption mode, this encryption mode is more secure and requires setting a key and salt. The decryption method must match the encryption method set for the channel.
- secret stringnullable
The encryption/decryption key. Required when
cryptionModeis not0. - salt stringnullable
A Base64-encoded, 32-byte encryption/decryption salt. Required only when
cryptionModeis7or8. - enableJsonProtocol booleannullable
Default:
falseSet the encoding format of the subtitle data pushed to the channel.
true: Use JSON to push subtitles and compress data with gzip. Uses less bandwidth, but requires decoding.false: Use Protobuf to push subtitles (default). The data volume is smaller. Suitable for scenarios with high transmission efficiency requirements.
- translateConfig objectnullable
Subtitle translation configuration.
Show propertiesHide properties
- languages arraynullable
The translation language array. You can specify a maximum of 4 different source languages.
info-
Single-language input: If you set the source language to a single language, the target language must be different, otherwise an error is returned. For example, if you set the source language to English, you cannot set the target language to English.
-
Mixed-language input: If you set the source language to mixed-language input, you can set the target language to one of the source languages. For example, if you set the source languages to Chinese and English, setting the target language to English translates both into English.
Each array item is an object with:
Show propertiesHide properties
- source stringrequired
The source language for translation. Refer to Supported Languages for details.
- target array[string]required
The target languages for translation. You can configure up to 10 target languages for each source language. Refer to Supported Languages for details.
-
- captionConfig objectnullable
Subtitle recording configuration.
Show propertiesHide properties
- sliceDuration integernullable
Default:
60Possible values:
5 to 28800The slice size of the recorded subtitle file, in seconds.
- storage objectnullable
Show propertiesHide properties
- accesskey stringrequired
The access key of the third-party cloud storage.
- secretkey stringrequired
The secret key of the third-party cloud storage.
- bucket stringrequired
The bucket name of the third-party cloud storage.
- vendor integerrequired
Possible values:
1 to 8The third-party cloud storage platform:
1: Amazon S32: Alibaba Cloud3: Tencent Cloud5: Microsoft Azure6: Google Cloud7: Huawei Cloud8: Baidu Smart Cloud11: Other S3-compatible object storage systems, such as MinIO and self-hosted cloud storage systems
- region integerrequired
The region information for the third-party cloud storage. To ensure successful and real-time uploading of recorded files, the cloud storage region must match the region of the application server where you initiate the request. For example, if your App server is in East US, set the cloud storage region to East US as well. See third-party storage regions for details.
- fileNamePrefix array[string]nullable
The storage location of the recorded file in the third-party cloud storage. The prefix length (including slashes) must not exceed 128 characters. The following characters are supported:
- Lowercase English letters (a-z)
- Uppercase English letters (A-Z)
- Numbers (0-9)
Symbols like slashes, underscores, and brackets must not appear in the string.
- extensionParams objectnullable
Optional third-party cloud storage extension configuration. When
storage.vendoris set to11, use this field to specify access information for standard S3-compatible object storage.Show propertiesHide properties
- endpoint stringnullable
The access URL for the S3-compatible service, including the scheme. For example,
http://host:9002. Required whenstorage.vendoris11. - type stringnullable
The rclone backend type. For standard S3, set this to
s3. - provider stringnullable
The storage provider name. For example,
Miniofor MinIO. - region stringnullable
The rclone S3 backend region. If provided, this overrides the default region inferred from
storage.region. - tag stringnullable
A base string for the object tag. Only effective when
storage.vendoris Tencent Cloud, Alibaba Cloud, or Amazon S3. - tagByRule array[object]nullable
Appends key=value to the tag according to the filename rules. When both the tag and the filename exist, the rule takes precedence. Only applicable to Tencent Cloud, Alibaba Cloud, and Amazon S3.
- sse stringnullable
Server-side encryption method.
aes256for AES-256 encryption,kmsfor AWS KMS. Only available for Amazon S3. - overwritekeys objectnullable
Maps the target object name using the uploaded file extension, used to override DstFileName.
- keywords array[string]nullable
Keyword list. Use it to improve the recognition accuracy of specific words during transcription. Supports up to 500 words.
- name stringrequired
Unique ID of the agent. Maximum length is 64 characters. You cannot use the same ID repeatedly.
Response
-
If the returned status code is
200, the request was successful. The response body contains the result of the request.OK
- agent_id string
The ID of the agent.
- create_ts integer
The Unix timestamp (in seconds) when the agent was created.
- status string
The current status of the agent:
IDLE: The agent is not initializedSTARTING: The agent is startingRUNNING: The agent is runningSTOPPING: The agent is exitingSTOPPED: The agent exited successfullyRECOVERING: The agent is recoveringFAILED: Agent exit failed
-
If the returned status code is not
200, the request failed. Refer to thedetailandreasonfields to understand the possible reasons for failure.