Skip to main content

Google Gemini Live (Vertex AI)

Google Gemini Live provides multimodal large language model capabilities with real-time audio processing, enabling natural voice conversations without separate ASR/TTS components. This page covers integration using Vertex AI, authenticated with Google Cloud Application Default Credentials (ADC).

info

Enabling MLLM automatically disables ASR, LLM, and TTS since the MLLM handles end-to-end voice processing directly. See turn_detection for turn detection options available with MLLMs.

Enable MLLM

To enable MLLM functionality, set enable_mllm to true under advanced_features.


_3
"advanced_features": {
_3
"enable_mllm": true
_3
}

Sample configuration

The following example shows a starting mllm parameter configuration you can use when you Start a conversational AI agent.


_30
"mllm": {
_30
"params": {
_30
"model": "gemini-3.1-flash-live-preview",
_30
"adc_credentials_string": "<GOOGLE_APPLICATION_CREDENTIALS_STRING>",
_30
"project_id": "<GOOGLE_CLOUD_PROJECT_ID>",
_30
"location": "<GOOGLE_CLOUD_REGION>",
_30
"messages": [
_30
{
_30
"role": "user",
_30
"content": "<HISTORY_CONTENT>"
_30
}
_30
],
_30
"instructions": "<YOUR_SYSTEM_PROMPT>",
_30
"voice": "Aoede",
_30
"transcribe_agent": true,
_30
"transcribe_user": true
_30
},
_30
"greeting_message": "Hi, how can I assist you today?",
_30
"input_modalities": [
_30
"audio"
_30
],
_30
"output_modalities": [
_30
"audio"
_30
],
_30
"vendor": "vertexai"
_30
}
_30
_30
"turn_detection": {
_30
"type": "server_vad"
_30
}

Key parameters

mllmrequired
  • params objectrequired

    Main configuration object for the Gemini Live model.

    Show propertiesHide properties
    • model stringrequired

      The Gemini Live model identifier.

    • adc_credentials_string stringrequired

      Base64-encoded Google Cloud Application Default Credentials (ADC).

    • project_id stringrequired

      Your Google Cloud project ID for Vertex AI access.

    • location stringrequired

      The Google Cloud region hosting the Gemini Live model. Check the Google Cloud documentation for the full list of available regions.

    • instructions stringnullable

      System instructions that define the agent’s behavior or tone.

    • messages array[object]nullable

      Optional array of conversation history items used for short-term memory.

    • voice stringnullable

      The voice identifier for audio output. For example, Aoede, Puck, Charon, Kore, Fenrir, Leda, Orus, or Zephyr.

    • transcribe_agent booleannullable

      Whether to transcribe the agent’s speech in real time.

    • transcribe_user booleannullable

      Whether to transcribe the user’s speech in real time.

  • input_modalities array[string]nullable

    Default: ["audio"]

    Input modalities for the MLLM.

    • ["audio"]: Audio-only input
    • ["audio", "text"]: Accept both audio and text input
  • output_modalities array[string]nullable

    Default: ["audio"]

    Output modalities for the MLLM.

    • ["audio"]: Audio-only response
    • ["text", "audio"]: Combined text and audio output
  • greeting_message stringnullable

    The message the agent speaks when a user joins the channel.

  • vendor stringrequired

    The MLLM provider identifier. Set to "vertexai" to use Google Gemini Live with Vertex AI.

For comprehensive API reference, real-time capabilities, and detailed parameter descriptions, see the Google Gemini Live API.