Skip to main content

Overview

Automatic Speech Recognition (ASR) engines convert spoken language into text that your AI agent can understand and process. They enable real-time voice interactions by transcribing user speech for the language model to respond to. Agora supports multiple ASR providers, allowing you to choose the best accuracy and latency for your specific requirements.

Integration steps

To integrate the ASR provider of your choice, follow these steps:

  1. Choose your ASR provider from the Supported ASR providers table
  2. Obtain an API key from the provider's console (if required)
  3. Copy the sample configuration for your chosen provider
  4. Replace the API key placeholder with your actual API key
  5. Configure the language settings for your target audience
  6. Specify the configuration in the request body as properties > asr when Starting a conversational AI agent

Supported ASR providers

Conversational AI Engine currently supports the following ASR providers:

ProviderDocumentation
AresAdaptive Recognition Engine for Speech
Microsoft AzureMicrosoft Azure STT
DeepgramDeepgram STT

Choose a provider based on your target languages, accuracy requirements, and latency needs. Refer to each provider's documentation for complete language catalogs and performance specifications.