Skip to main content

Cartesia (Beta)

Cartesia provides ultra-fast, low-latency text-to-speech with real-time streaming capabilities, optimized for interactive conversational AI applications.

Sample configuration

The following example shows a starting tts parameter configuration you can use when you Start a conversational AI agent.


_16
"tts": {
_16
"vendor": "cartesia",
_16
"params": {
_16
"api_key": "<your_cartesia_key>",
_16
"model_id": "sonic-2",
_16
"voice": {
_16
"mode": "id",
_16
"id": "<voice_id>"
_16
},
_16
"output_format": {
_16
"container": "raw",
_16
"sample_rate": 16000
_16
},
_16
"language": "en"
_16
}
_16
}

caution

The parameters listed on this page are validated for use with Conversational AI Engine. Required parameters must be provided as documented. Any additional parameters are passed through directly to the underlying vendor without validation. For a full list of supported options, refer to the Cartesia TTS documentation.

Key parameters

paramsrequired
  • api_key stringrequired

    The API key used for authentication. Get your API key from the Cartesia Console.

  • model_id stringrequired

    Identifier of the model to be used.

  • voice objectrequired

    Voice configuration object.

    Show propertiesHide properties
    • mode stringrequired

      Voice selection mode. Use id to select by voice identifier.

    • id stringrequired

      The identifier of the selected voice for speech synthesis.

  • output_format objectnullable

    Audio output format configuration

    Show propertiesHide properties
    • container stringnullable

      Audio container format for the output stream.

    • sample_rate numbernullable

      Default: 16000

      Possible values: 8000, 16000, 22050, 24000, 44100, 48000

      Audio sampling rate in Hz

  • language stringnullable

    Target language for speech synthesis.