Skip to main content

Migrate from Real-Time STT 5.x to 7.x

This document guides you through migrating from version 5.x of Real-Time STT to the latest version 7.x. The updated architecture delivers improved stability, a simplified API workflow, and enhanced functionality to support a broader range of scenarios.

info

Thoroughly test your implementation during the migration process and prepare a fallback plan. Complete verification in a test environment before deploying it to production.

Version differences and upgrade advantages

Version history

  • Version 5.x had a more complex request structure and deeply nested parameter configurations.

  • Version 7.x is the latest recommended release. It offers several key improvements over version 5.x:

    • Simplified API workflow: Removes the need to obtain a builderToken through the acquire interface.
    • Consistent API naming: Follows RESTful naming conventions.
    • Extended functionality: Adds support for configuring language settings by UID.
    • Standardized URL paths: Retains the same domain but introduces a more structured path format.

Major changes

API endpoint and method name changes

5.x7.xSummary
acquireThis method has been removed.The process of obtaining a builderToken through the acquire interface has been removed in version 7.x.
stopleaveMethod for stopping a task has been renamed to leave.
startjoinThe method for starting an STT task has been renamed to join.
updateupdateThe method name remains unchanged.
querygetThe method for querying task status has been renamed to get.

URL path changes

API5.x URL7.x URL
Start taskhttps://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/tasks?builderToken={{tokenName}}https://api.agora.io/api/speech-to-text/v1/projects/{appid}/join
Query statushttps://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/tasks/{{taskId}}?builderToken={{tokenName}}https://api.agora.io/api/speech-to-text/v1/projects/{appid}/agents/{agent_id}
Stop taskhttps://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/tasks/{{taskId}}?builderToken={{tokenName}}https://api.agora.io/api/speech-to-text/v1/projects/{appid}/agents/{agent_id}/leave
Update configurationhttps://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/tasks/{{taskId}}?builderToken={{tokenName}}https://api.agora.io/api/speech-to-text/v1/projects/{appid}/agents/{agent_id}/update

Parameter changes

  • Task identifier changes

    • 5.x: Uses taskId to identify tasks
    • 7.x: Uses agent_id to identify tasks
  • Authentication method changes

    • 5.x: Need to call acquire to obtain a builderToken first. Subsequent requests were passed through URL parameters.
    • 7.x: Removes the process of obtaining a builderToken through the acquire interface, simplifying the API call process.

Request parameter changes

  • Structural differences between 5.x and 7.x:

    • In 5.x, the API uses deeply nested structures such as audio.agoraRtcConfig and config.recognizeConfig.
    • In 7.x, the API uses a flatter structure with top-level fields such as languages, rtcConfig, and captionConfig.
  • Parameters added to the join method (formerly start):

    • name: Required. A unique task name of up to 64 characters. This parameter is used for task deduplication to ensure that only one speech-to-text task runs in the same channel. To run multiple speech-to-text tasks in a channel, set different name values.
    • uidLanguagesConfig: Optional. Allows you to configure different languages for specific UIDs, providing greater flexibility.

Other changes

  • URL path standardization: Version 7.x uses the same domain name (api.agora.io) but introduces a more standardized path structure, such as /api/speech-to-text/v1/projects/{appid}/.
  • HTTP method standardization: Some methods, such as stop, now use POST instead of DELETE to better comply with RESTful conventions.
  • Return value format: Version 7.x return values include more detailed task status information.

Migrate to 7.x

There are large differences between the API structure of 5.x and 7.x. This section guides you through the migration process.

Migration steps

The migration steps are as follows:

Update the API call process

In version 5.x, the basic process was as follows:

  1. Call acquire to get a builderToken.
  2. Use the builderToken to call start and begin the task.
  3. Call query to check the task status.
  4. Call stop to end the task.

For version 7.x, modify the API call sequence as follows:

  1. Call join to start the task and get the agent_id.
  2. Call get to check the task status.
  3. Call leave to stop the task.
  4. Call update to change the task configuration (optional).

Refactor the request body format

To refactor the request body format from version 5.x, use the following mapping relationships:

  1. Map audio.agoraRtcConfig.channelName to rtcConfig.channelName.
  2. Map audio.agoraRtcConfig.maxIdleTime to maxIdleTime.
  3. Map config.recognizeConfig.language to the languages array.
  4. Map audio.agoraRtcConfig.uid and token to rtcConfig.subBotUid and subBotToken.
  5. Set rtcConfig.pubBotUid and pubBotToken, if needed.
  6. If storage is required, map the storage configuration to captionConfig.storage.

Update the URL and authentication method

  1. Remove all calls to acquire and the builderToken processing logic.
  2. Update the API path from /v1/projects/{{appId}}/rtsc/speech-to-text/ to /api/speech-to-text/v1/projects/{appid}/.
  3. Update the task identifier from taskId to agent_id.
  4. Implement changes in HTTP methods. Note that the request for stopping a task now uses POST instead of DELETE.

Code comparison

Comparison between 5.x and 7.x code:

5.x


_48
// 1. Get builderToken
_48
const acquireResponse = await fetch(`https://api.agora.io/v1/projects/${appId}/rtsc/speech-to-text/builderTokens`, {
_48
method: 'POST',
_48
body: JSON.stringify({ "instanceId": "your-instance-id" })
_48
});
_48
const { tokenName } = await acquireResponse.json();
_48
_48
// 2. Start the task
_48
const startResponse = await fetch(`https://api.agora.io/v1/projects/${appId}/rtsc/speech-to-text/tasks?builderToken=${tokenName}`, {
_48
method: 'POST',
_48
body: JSON.stringify({
_48
"audio": {
_48
"subscribeSource": "AGORARTC",
_48
"agoraRtcConfig": {
_48
"channelName": "your-channel",
_48
"uid": "123",
_48
"token": "your-token",
_48
"channelType": "LIVE_TYPE",
_48
"subscribeConfig": {
_48
"subscribeMode": "CHANNEL_MODE"
_48
},
_48
"maxIdleTime": 60
_48
}
_48
},
_48
"config": {
_48
"features": ["RECOGNIZE"],
_48
"recognizeConfig": {
_48
"language": "zh-CN",
_48
"profanityFilter": true,
_48
"output": {
_48
"destinations": ["AgoraRTCDataStream"]
_48
}
_48
}
_48
}
_48
})
_48
});
_48
const { taskId } = await startResponse.json();
_48
_48
// 3. Query task status
_48
const queryResponse = await fetch(`https://api.agora.io/v1/projects/${appId}/rtsc/speech-to-text/tasks/${taskId}?builderToken=${tokenName}`, {
_48
method: 'GET'
_48
});
_48
const status = await queryResponse.json();
_48
_48
// 4. Stop the task
_48
await fetch(`https://api.agora.io/v1/projects/${appId}/rtsc/speech-to-text/tasks/${taskId}?builderToken=${tokenName}`, {
_48
method: 'DELETE'
_48
});

7.x code


_41
// 1. Start the task
_41
const joinResponse = await fetch(`https://api.agora.io/api/speech-to-text/v1/projects/${appId}/join`, {
_41
method: 'POST',
_41
headers: headers,
_41
body: JSON.stringify({
_41
"name": "my-stt-task", // New parameter, required
_41
"languages": ["zh-CN"], // Convert language string to array
_41
"maxIdleTime": 60,
_41
"rtcConfig": {
_41
"channelName": "your-channel",
_41
"subBotUid": "123", // Original uid field
_41
"subBotToken": "your-sub-token", // Original token field
_41
"pubBotUid": "456", // New field
_41
"pubBotToken": "your-pub-token", // New field
_41
"subscribeAudioUids": ["789"] // New parameter, optional
_41
}
_41
})
_41
});
_41
_41
// Error handling
_41
if (!joinResponse.ok) {
_41
const errorData = await joinResponse.json();
_41
console.error('Failed to start the task:', errorData);
_41
throw new Error(`Failed to start the task: ${errorData.message || joinResponse.status}`);
_41
}
_41
_41
const { agent_id } = await joinResponse.json();
_41
_41
// 2. Query the task status
_41
const getResponse = await fetch(`https://api.agora.io/api/speech-to-text/v1/projects/${appId}/agents/${agent_id}`, {
_41
method: 'GET',
_41
headers: headers
_41
});
_41
_41
const status = await getResponse.json();
_41
_41
// 3. Stop the task
_41
await fetch(`https://api.agora.io/api/speech-to-text/v1/projects/${appId}/agents/${agent_id}/leave`, {
_41
method: 'POST', // Note: In 7.x it is POST, not DELETE
_41
headers: headers
_41
});

Migration checklist

  • Remove all code related to the acquire method.
  • Update all API calls to use the latest URLs.
  • Replace start with join.
  • Replace query with get .
  • Replace stop with leave and change the HTTP method from DELETE to POST.
  • Refactor the request body structure from a nested to a flat structure.
  • Update the task identifier from taskId to agent_id.
  • Add any new parameters that are required.
  • Update the response-handling logic.
  • Add error-handling mechanisms.
  • Test all updated API calls.
  • Prepare a fallback plan.

Troubleshooting

If you encounter problems during the migration process, refer to the following table:

Error descriptionPossible causeRecommended action
Authentication failedMissing or incorrect authentication header.Check that the auth configuration is correct and that the app ID is enabled for the new service.
Task cannot be startedIncorrect request parameter format.Verify the parameter format against the documentation; ensure languages is an array.
Task status cannot be queriedIncorrect agent_id or the task has already ended.Use the correct agent_id returned by the join interface.
Audio subscription problemSubscription config doesn't match UID in the channel.Check that subscribeAudioUids matches actual UIDs in the channel.
Recognition result problemIncorrect language configuration.Ensure that the languages array includes a valid language code, such as ["en-US"]

FAQs

Why should I migrate from version 5.x to 7.x?

Version 7.x delivers a more reliable and robust service, simplifies the API process, and adds new features such as support for language identification at the UID level. It significantly improves both stability and scalability.

Do I need to keep both version 5.x and 7.x in my codebase?

No. Migrate fully to version 7.x. Older versions are deprecated and may not be maintained in the future.

What should I be aware of during the migration process?

If you're using version 5.x, pay close attention to the complete restructuring of the request body. For all users, make sure to update the API endpoints, remove the acquire-token flow, and implement changes to HTTP methods. For example, the stop method has changed from DELETE to POST.

Does version 7.x support string type UIDs?

Currently, UIDs are still treated as integers internally, although they are passed as strings in the API. This prepares for future support of true string UIDs. Version 7.x automatically converts string UIDs to integers for compatibility.

Have error codes changed in version 7.x?

Yes. Version 7.x includes standardized error codes for more consistent handling.

How can I verify that the new API works correctly after migration?

Follow these steps to validate the functionality after migration:

  • Start a task and obtain the agent_id.
  • Query the task status to confirm it is running.
  • Perform speech recognition tests and review the output.
  • Test stopping the task and updating its configuration.

How can I ensure smooth migration?

Use the following approach to ensure smooth migration:

  • Complete migration and testing in a staging environment.
  • Prepare a fallback plan and keep the old version temporarily.
  • Use a phased rollout, starting with non-critical services.
  • Monitor the new API’s performance and error rate before full deployment.