Skip to main content

Cartesia (Beta)

Cartesia provides ultra-fast, low-latency text-to-speech with real-time streaming capabilities, optimized for interactive conversational AI applications.

Sample configuration

The following example shows a starting tts parameter configuration you can use when you Start a conversational AI agent.


_16
"tts": {
_16
"vendor": "cartesia",
_16
"params": {
_16
"api_key": "<your_cartesia_key>",
_16
"model_id": "sonic-2",
_16
"voice": {
_16
"mode": "id",
_16
"id": "<voice_id>"
_16
},
_16
"output_format": {
_16
"container": "raw",
_16
"sample_rate": 16000
_16
},
_16
"language": "en"
_16
}
_16
}

Key parameters

paramsrequired
  • api_key stringrequired

    The API key used for authentication. Get your API key from the Cartesia Console.

  • model_id stringrequired

    Identifier of the model to be used.

  • voice objectrequired

    Voice configuration object.

    Show propertiesHide properties
    • mode stringrequired

      Voice selection mode. Use id to select by voice identifier.

    • id stringrequired

      The identifier of the selected voice for speech synthesis.

  • output_format objectnullable

    Audio output format configuration

    Show propertiesHide properties
    • container stringnullable

      Audio container format for the output stream.

    • sample_rate numbernullable

      Default: 16000

      Possible values: 8000, 16000, 22050, 24000, 44100, 48000

      Audio sampling rate in Hz

  • language stringnullable

    Target language for speech synthesis.

For advanced configuration options, voice customization, and detailed parameter descriptions, see the Cartesia TTS documentation.