OpenAI Realtime API
OpenAI Realtime provides multimodal large language model capabilities with real-time audio processing, enabling natural voice conversations without separate ASR/TTS components.
Enable MLLM
To enable MLLM functionality, set enable_mllm
to true
under advanced_features
.
Sample configuration
The following example shows a starting mllm
parameter configuration you can use when you Start a conversational AI agent.
Key parameters
mllmrequired
- url stringrequired
The WebSocket URL for OpenAI Realtime API.
- api_key stringrequired
The API key used for authentication. Get your API key from the OpenAI Console.
- messages array[object]nullable
Array of conversation items used for short-term memory management. Uses the same structure as
item.content
from the OpenAI Realtime API. - params objectnullable
Additional MLLM configuration parameters.
- Modalities override: The
modalities
setting in params is overridden byinput_modalities
andoutput_modalities
. - Turn detection override: The
turn_detection
setting in params is overridden by theturn_detection
section outside ofmllm
.
See MLLM Overview for details.
Show propertiesHide properties
- model stringnullable
The model identifier.
- voice stringnullable
The voice identifier for audio output.
- instructions stringnullable
System instructions that define the assistant's behavior and personality.
- input_audio_transcription objectnullable
Configuration for audio input transcription.
Show propertiesHide properties
- language stringnullable
The language of the input audio. Supplying the input language in ISO-639-1 format (For example
en
) improves accuracy and latency. - model stringnullable
The model to use for transcription. Current options are
gpt-4o-transcribe
,gpt-4o-mini-transcribe
, andwhisper-1
. - prompt stringnullable
An optional text to guide the model's style or continue a previous audio segment. For
whisper-1
, the prompt is a list of keywords. Forgpt-4o-transcribe
models, the prompt is a free text string, for example "expect words related to technology".
- Modalities override: The
- max_history integernullable
Default:
32
The number of conversation history messages to maintain. Cannot exceed the model's context window.
- input_modalities array[string]nullable
Default:
["audio"]
MLLM input modalities:
["audio"]
: Audio only["audio", "text"]
: Audio plus text
- output_modalities array[string]nullable
Default:
["text", "audio"]
Output format options:
["text", "audio"]
for both text and voice responses. - greeting_message stringnullable
Initial message the agent speaks when a user joins the channel.
- vendor stringnullable
MLLM provider identifier. Set to
openai
for OpenAI Realtime API. - style stringnullable
API request style. Set to
openai
for OpenAI Realtime API format.
For comprehensive API reference, real-time capabilities, and detailed parameter descriptions, see the OpenAI Realtime API documentation.