ElevenLabs
ElevenLabs provides highly realistic AI voices with advanced prosody and natural speech patterns, delivering lifelike audio synthesis with emotional nuance and conversational flow.
You need a paid ElevenLabs plan for reliable TTS integration.
ElevenLabs may restrict or disable free-tier accounts due to abuse-detection mechanisms, even if free credits are available. To avoid missing audio responses during testing and production, ensure you use a paid plan.
Sample configuration
The following example shows a starting tts parameter configuration you can use when you Start a conversational AI agent.
The parameters listed on this page are validated for use with Conversational AI Engine. Required parameters must be provided as documented. Any additional parameters are passed through directly to the underlying vendor without validation. For advanced configuration options, voice cloning, and detailed parameter descriptions, see the ElevenLabs TTS documentation.
Key parameters
paramsrequired
- base_url stringrequired
The endpoint URL for the OpenAI TTS service. See Data residency.
- key stringrequired
The API key used for authentication. Get your API key from the ElevenLabs Console.
- model_id stringrequired
Identifier of the model to be used. Popular options include
eleven_flash_v2_5for speed oreleven_multilingual_v2for quality. - voice_id stringrequired
The identifier for the selected voice for speech synthesis. Browse available voices in the Voice Library.
- sample_rate numbernullable
Default:
24000Audio sampling rate in Hz. Common values:
16000,22050,24000,44100. - speed numbernullable
Default:
1.0Speed up or slow down the speed of the generated speech. Range
0.7to1.2inclusive. - stability numbernullable
Controls voice stability. Higher values
(0.8-1.0)produce more consistent speech, lower values(0.0-0.5)add more variation. - similarity_boost numbernullable
Enhances similarity to the original voice. Range:
0.0-1.0. Higher values stick closer to the training voice. - style numbernullable
Controls speaking style and expressiveness. Higher values increase emotional range and variation.
- use_speaker_boost booleannullable
Improves voice quality and similarity when enabled. Recommended for most use cases.