Cartesia (Beta)

Cartesia provides ultra-fast, low-latency text-to-speech with real-time streaming capabilities, optimized for interactive conversational AI applications.

Sample configuration

The following example shows a starting tts parameter configuration you can use when you Start a conversational AI agent.

"tts": {
  "vendor": "cartesia",
  "params": {
    "api_key": "<your_cartesia_key>",
    "model_id": "sonic-2",
    "voice": {
        "mode": "id",
        "id": "<voice_id>"
    },
    "output_format": {
        "container": "raw",
        "sample_rate": 16000
    },
    "language": "en"
  }
}

Key parameters

paramsrequired

api_key stringrequired
The API key used for authentication. Get your API key from the Cartesia Console.
model_id stringrequired
Identifier of the model to be used.
voice objectrequired
Voice configuration object.
Show propertiesHide properties
- mode stringrequired
  Voice selection mode. Use id to select by voice identifier.
- id stringrequired
  The identifier of the selected voice for speech synthesis.
output_format objectnullable
Audio output format configuration
Show propertiesHide properties
- container stringnullable
  Audio container format for the output stream.
- sample_rate numbernullable
  Default: 16000
  Possible values: 8000, 16000, 22050, 24000, 44100, 48000
  Audio sampling rate in Hz
language stringnullable
Target language for speech synthesis.

For advanced configuration options, voice customization, and detailed parameter descriptions, see the Cartesia TTS documentation.

Sample configuration​

Key parameters​

Was this helpful?

Sample configuration

Key parameters