Use filler words
In conversational AI, delays in LLM responses may cause users to wonder if the agent is still processing, repeat their question, or disengage entirely. Filler words address this by playing short phrases while the agent waits for the LLM to generate a response. This keeps the conversation flowing, reduces user anxiety, and creates a more human-like interaction.
Common use cases for filler words include:
- MCP tool calls: When the agent invokes tools through MCP servers, response times can increase significantly. Filler words bridge this gap while the agent waits for tool results.
- Complex queries: Queries that require more LLM processing time benefit from a brief acknowledgment to signal that the agent is working on a response.
- Customer service scenarios: In support interactions, filler words such as "Let me look into that for you" reassure users that their request is being handled.
Prerequisites
Before you begin, make sure you have the following:
- An Agora account and project with Conversational AI enabled.
- A working Conversational AI Engine project. If you don't have one yet, follow the quickstart to set one up.
Implementation
Enable filler words
To enable filler words, add the filler_words object to properties when calling the Start a conversational AI agent API. Set enable to true:
When enabled, the agent plays filler phrases during periods of silence while waiting for LLM output. Filler word playback follows these rules:
- Playback order: When multiple filler words or LLM responses are waiting to be played, they are played in the order they arrive.
- Interruption control: Filler words inherit the interruption mode setting from the global configuration in
turn_detection.config.
Configure
The filler_words object contains three main sections: enable, trigger, and content.
Trigger
The trigger object defines when the agent plays filler words. Currently, the fixed_time mode is supported. In this mode, filler words play when the LLM response wait time exceeds a specified threshold.
| Parameter | Type | Range | Description |
|---|---|---|---|
trigger.mode | String | fixed_time | Trigger mode. Currently only fixed_time is supported. |
trigger.fixed_time_config.response_wait_ms | Integer | 100–10000 | LLM response wait threshold in milliseconds. The agent plays a filler phrase when the LLM takes longer than this duration to respond. |
Choose a response_wait_ms value based on your use case:
- Lower values (500–1000 ms): Better for fast-paced interactions where silence is more noticeable.
- Higher values (1500–3000 ms): Suitable for scenarios where users expect some processing time, such as complex queries or data lookups.
Content
The content object defines the source and selection behavior of filler phrases. Currently, the static mode is supported, which uses a predefined list of phrases.
| Parameter | Type | Description |
|---|---|---|
content.mode | String | Content mode. Currently only static is supported. |
content.static_config.phrases | Array[String] | List of filler phrases. Maximum 100 phrases, each up to 50 English words. |
content.static_config.selection_rule | String | Selection rule for choosing phrases. Accepts shuffle or round_robin. |
Selection rules:
shuffle: Randomly selects phrases without repeating until all phrases have been used. After a full cycle, the list is reshuffled and a new round begins.round_robin: Selects phrases sequentially from the list. After all phrases are played, a new cycle begins.
Sample configuration
Add the following filler_words object to properties in your Start a conversational AI agent request body. This example plays a random filler phrase if the LLM takes longer than 1.5 seconds to respond:
Best practices
Keep the following tips in mind when configuring filler words:
- Keep phrases short and natural: Use brief, conversational phrases that sound like something a person would say.
- Match phrasing to your use case: For customer support agents, use reassuring phrases like "Let me look into that for you." For casual assistants, use informal phrases like "Hmm, one sec."
- Tune the trigger threshold: Start with a
response_wait_msof 1500 ms and adjust based on your LLM's typical response time. If your agent frequently invokes tools, consider a lower threshold to cover longer processing times. - Provide enough variety: Include at least 4–6 phrases to avoid sounding repetitive. Use
shuffleselection to maximize variety across conversation turns.