Skip to main content

Microsoft Azure

Microsoft Azure offers neural voices in multiple languages with options for different speaking styles and emotions, providing enterprise-grade text-to-speech capabilities with high-quality audio output.

Sample configuration

The following example shows a starting tts parameter configuration you can use when you Start a conversational AI agent.


_11
"tts": {
_11
"vendor": "microsoft",
_11
"params": {
_11
"key": "<your_microsoft_key>",
_11
"region": "eastus",
_11
"voice_name": "en-US-AndrewMultilingualNeural",
_11
"speed": 1.0,
_11
"volume": 70,
_11
"sample_rate": 24000
_11
}
_11
}

caution

The parameters listed on this page are validated for use with Conversational AI Engine. Required parameters must be provided as documented. Any additional parameters are passed through directly to the underlying vendor without validation. For advanced configuration options, voice galleries, and detailed parameter descriptions, see the Microsoft Azure TTS documentation.

Key parameters

paramsrequired
  • key stringrequired

    The API key used for authentication. Get your API key from the Azure Portal.

  • region stringrequired

    The Azure region where the speech service is hosted (For example, eastus, westus2).

  • voice_name stringrequired

    The identifier for the selected voice for speech synthesis. See available voices for options.

  • speed numbernullable

    Default: 1.0

    Speaking rate of the text. Values between 0.5 and 2.0 times the original audio speed.

  • volume numbernullable

    Default: 100

    Audio volume as a number between 0.0 and 100.0, where 0.0 is quietest and 100.0 is loudest.

  • sample_rate integernullable

    Default: 24000

    Audio sampling rate in Hz. Common values: 16000, 24000, 48000.