Skip to main content

Google Gemini Live

Google Gemini Live provides multimodal large language model capabilities with real-time audio processing, enabling natural voice conversations without separate ASR/TTS components.

Enable MLLM

To enable MLLM functionality, set enable_mllm to true under advanced_features.


_3
"advanced_features": {
_3
"enable_mllm": true
_3
}

Sample configuration

The following example shows a starting mllm parameter configuration you can use when you Start a conversational AI agent.


_29
"mllm": {
_29
"params": {
_29
"model": "gemini-live-2.5-flash-preview-native-audio-09-2025",
_29
"adc_credentials_string": "<GOOGLE_APPLICATION_CREDENTIALS_STRING>",
_29
"project_id": "<GOOGLE_ASR_PROJECT_ID>",
_29
"location": "<GOOGLE_CLOUD_REGION>",
_29
"messages": [{
_29
"role": "user",
_29
"content": "<HISTORY_CONTENT>"
_29
}],
_29
"instructions": "<YOUR_SYSTEM_PROMPT>",
_29
"voice": "Aoede",
_29
"transcribe_agent": true,
_29
"transcribe_user": true
_29
},
_29
"greeting_message": "Hi, how can I assist you today?",
_29
"input_modalities": [
_29
"audio"
_29
],
_29
"output_modalities": [
_29
"audio"
_29
],
_29
"vendor": "vertexai",
_29
"style": "openai"
_29
}
_29
_29
"turn_detection": {
_29
"type": "server_vad"
_29
}

Key parameters

mllmrequired
  • params objectrequired

    Main configuration object for the Gemini Live model.

    Show propertiesHide properties
    • model stringrequired

      The Gemini Live model identifier.

    • adc_credentials_string stringrequired

      Base64-encoded Google Cloud Application Default Credentials (ADC).

    • project_id stringrequired

      Your Google Cloud project ID for Vertex AI access.

    • location stringrequired

      The Google Cloud region hosting the Gemini Live model. Check the Google Cloud documentation for the full list of available regions.

    • instructions stringnullable

      System instructions that define the agent’s behavior or tone.

    • messages array[object]nullable

      Optional array of conversation history items used for short-term memory.

    • voice stringnullable

      The voice identifier for audio output. For example, "Aoede", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", or "Zephyr".

    • transcribe_agent booleannullable

      Whether to transcribe the agent’s speech in real time.

    • transcribe_user booleannullable

      Whether to transcribe the user’s speech in real time.

  • input_modalities array[string]nullable

    Default: ["audio"]

    Input modalities for the MLLM.

    • ["audio"]: Audio-only input
    • ["audio", "text"]: Accept both audio and text input
  • output_modalities array[string]nullable

    Default: ["audio"]

    Output modalities for the MLLM.

    • ["audio"]: Audio-only response
    • ["text", "audio"]: Combined text and audio output
  • greeting_message stringnullable

    Initial message the agent speaks when a user joins the channel. Example: "Hi, how can I assist you today?".

  • vendor stringrequired

    MLLM provider identifier. Set to "vertexai" for Google Gemini Live.

  • style stringrequired

    API request style. Set to "openai" for OpenAI-compatible request formatting.

For comprehensive API reference, real-time capabilities, and detailed parameter descriptions, see the Google Gemini Live API.