Google Gemini Live

Google Gemini Live provides multimodal large language model capabilities with real-time audio processing, enabling natural voice conversations without separate ASR/TTS components.

info

Enabling MLLM automatically disables ASR, LLM, and TTS since the MLLM handles end-to-end voice processing directly. When you set enable_mllm to true, enable_aivad is disabled. See turn_detection for turn detection options available with MLLMs.

Enable MLLM

To enable MLLM functionality, set enable_mllm to true under advanced_features.

"advanced_features": {
  "enable_mllm": true
}

Sample configuration

The following example shows a starting mllm parameter configuration you can use when you Start a conversational AI agent.

"mllm": {
    "params": {
        "model": "gemini-live-2.5-flash-preview-native-audio-09-2025",
        "adc_credentials_string": "<GOOGLE_APPLICATION_CREDENTIALS_STRING>",
        "project_id": "<GOOGLE_ASR_PROJECT_ID>",
        "location": "<GOOGLE_CLOUD_REGION>",
        "messages": [{
            "role": "user",
            "content": "<HISTORY_CONTENT>"
        }],
        "instructions": "<YOUR_SYSTEM_PROMPT>",
        "voice": "Aoede", 
        "transcribe_agent": true,
        "transcribe_user": true
    },
    "greeting_message": "Hi, how can I assist you today?",
    "input_modalities": [
        "audio"
    ],
    "output_modalities": [
        "audio"
    ],
    "vendor": "vertexai",
    "style": "openai"
}
"turn_detection": {
    "type": "server_vad"
}

Key parameters

mllmrequired

params objectrequired
Main configuration object for the Gemini Live model.
Show propertiesHide properties
- model stringrequired
  The Gemini Live model identifier.
- adc_credentials_string stringrequired
  Base64-encoded Google Cloud Application Default Credentials (ADC).
- project_id stringrequired
  Your Google Cloud project ID for Vertex AI access.
- location stringrequired
  The Google Cloud region hosting the Gemini Live model. Check the Google Cloud documentation for the full list of available regions.
- instructions stringnullable
  System instructions that define the agent’s behavior or tone.
- messages array[object]nullable
  Optional array of conversation history items used for short-term memory.
- voice stringnullable
  The voice identifier for audio output. For example, "Aoede", "Puck", "Charon", "Kore", "Fenrir", "Leda", "Orus", or "Zephyr".
- transcribe_agent booleannullable
  Whether to transcribe the agent’s speech in real time.
- transcribe_user booleannullable
  Whether to transcribe the user’s speech in real time.
input_modalities array[string]nullable
Default: ["audio"]
Input modalities for the MLLM.
- ["audio"]: Audio-only input
- ["audio", "text"]: Accept both audio and text input
output_modalities array[string]nullable
Default: ["audio"]
Output modalities for the MLLM.
- ["audio"]: Audio-only response
- ["text", "audio"]: Combined text and audio output
greeting_message stringnullable
Initial message the agent speaks when a user joins the channel. Example: "Hi, how can I assist you today?".
vendor stringrequired
MLLM provider identifier. Set to "vertexai" for Google Gemini Live.
style stringrequired
API request style. Set to "openai" for OpenAI-compatible request formatting.

For comprehensive API reference, real-time capabilities, and detailed parameter descriptions, see the Google Gemini Live API.

Enable MLLM​

Sample configuration​

Key parameters​

Enable MLLM

Sample configuration

Key parameters