OpenAI Realtime API
OpenAI Realtime provides multimodal large language model capabilities with real-time audio processing, enabling natural voice conversations without separate ASR/TTS components.
Enable MLLM
To enable MLLM functionality, set enable_mllm to true under advanced_features.
Sample configuration
The following example shows a starting mllm parameter configuration you can use when you Start a conversational AI agent.
Key parameters
mllmrequired
- url stringrequired
The WebSocket URL for OpenAI Realtime API.
- api_key stringrequired
The API key used for authentication. Get your API key from the OpenAI Console.
- messages array[object]nullable
Array of conversation items used for short-term memory management. Uses the same structure as
item.contentfrom the OpenAI Realtime API. - params objectnullable
Additional MLLM configuration parameters.
- Modalities override: The
modalitiessetting in params is overridden byinput_modalitiesandoutput_modalities. - Turn detection override: The
turn_detectionsetting in params is overridden by theturn_detectionsection outside ofmllm.
See MLLM Overview for details.
Show propertiesHide properties
- model stringnullable
The model identifier.
- voice stringnullable
The voice identifier for audio output.
- instructions stringnullable
System instructions that define the assistant's behavior and personality.
- input_audio_transcription objectnullable
Configuration for audio input transcription.
Show propertiesHide properties
- language stringnullable
The language of the input audio. Supplying the input language in ISO-639-1 format (For example
en) improves accuracy and latency. - model stringnullable
The model to use for transcription. Current options are
gpt-4o-transcribe,gpt-4o-mini-transcribe, andwhisper-1. - prompt stringnullable
An optional text to guide the model's style or continue a previous audio segment. For
whisper-1, the prompt is a list of keywords. Forgpt-4o-transcribemodels, the prompt is a free text string, for example "expect words related to technology".
- Modalities override: The
- max_history integernullable
Default:
32The number of conversation history messages to maintain. Cannot exceed the model's context window.
- input_modalities array[string]nullable
Default:
["audio"]MLLM input modalities:
["audio"]: Audio only["audio", "text"]: Audio plus text
- output_modalities array[string]nullable
Default:
["text", "audio"]Output format options:
["text", "audio"]for both text and voice responses. - greeting_message stringnullable
Initial message the agent speaks when a user joins the channel.
- vendor stringnullable
MLLM provider identifier. Set to
openaifor OpenAI Realtime API. - style stringnullable
API request style. Set to
openaifor OpenAI Realtime API format.
For comprehensive API reference, real-time capabilities, and detailed parameter descriptions, see the OpenAI Realtime API documentation.