Pricing
This page introduces the billing policy for the Real-Time STT add-on provided by Agora.
Your billing details may differ if you have signed a contract with Agora.
Overview
Agora calculates the billing of all projects under your Agora account on a monthly basis. Billing begins once you enable Real-Time STT.
Transcription fee
When Real-Time STT is enabled for a channel, it transcribes the audio of its active hosts. When Real-Time STT is enabled for specific hosts, it only transcribes the audio of the specified hosts and ignores the others. The Real-Time STT service employs algorithms that remove the periods of silence and improve WER (Word Error Rate) of transcription. The processed audio is transcribed by the Real-Time STT engine and referred to as transcription duration. Agora charges for the transcription duration of all or specified hosts in the channel.
The unit price is as follows:
Billing item | Usage, minutes per month | Pricing, US$/1,000 minutes |
---|---|---|
Transcription duration | Above 0 | 16.99 |
Examples:
- Let's say there is a channel existing for 10 minutes. There are 3 active hosts - A, B, and C - all in the unmuted state.
- #1: If Real-Time STT is enabled for this channel at the start, the algorithm will remove 8 minutes of silent audio for host A, 7 minutes for host B, and 7 minutes for host C. Therefore, the transcription duration is (10 - 8) + (10 -7) + (10 - 7) = 2 + 3 + 3 = 8 minutes.
- #2: If Real-Time STT is enabled for host A, the algorithm will remove 8 minutes of silent audio for host A. The transcription duration is 10 - 8 = 2 minutes.
Notes:
- WER is known as the accuracy of an STT engine - the smaller, the better.
- Enabling Real-Time STT for a channel or host that are quiet for a long time is not recommended. In this case, audio is processed and removed, and the STT engine runs in the standby mode. Agora will charge for this standby duration at $0.99/1,000 minutes. In example #1, the standby duration is calculated the following way: Enable duration - transcription duration = 10 - 8 = 2 minutes. In example #2, the standby duration is calculated the following way: Enable duration - transcription duration = 10 - 2 = 8 minutes.
Language identification fee
Real-Time STT supports dynamic language detection when two or more languages are enabled for a channel or specific hosts. The LID (language identification) duration is the same as the transcription duration.
Billing item | Usage, minutes per month | Pricing, US$/1,000 minutes |
---|---|---|
Language identification duration | Above 0 | 5.00 |
Examples:
- Let's say there is a channel existing for 10 minutes. There are 3 active hosts - A, B, and C - all in the unmuted state.
- #3: If Spanish and Chinese LID is enabled for this channel at the start, the algorithm will remove 8 minutes of silent audio for host A, 7 minutes for host B and 7 minutes for host C. Therefore, the transcription duration is 2 + 3 + 3 = 8 minutes. the LID duration is 8 minutes, too, being the sum of 2 minutes for host A, 3 minutes for host B, and 3 minutes for host C.
- If Spanish and Chinese LID is enabled for host A, then the transcription duration and LID duration are both 2 minutes.
Notes:
- The Real-Time STT transcription duration does not change if you enable more than 1 language.
- If only 1 language is set for a channel or a specified host, the language detection will not start.
Free-of-charge duration
Real-Time STT provides 300 minutes of free-of-charge duration for integration and testing purposes.
Contact sales@agora.io or your AE to get a discount.