Interrupt agent

When interacting with an agent, you may need to interrupt the agent to begin a new round of conversation. The Agora Conversational AI Engine supports agent interruption in the following ways:

Voice interruption: The engine detects user voice input and automatically stops the agent’s response.
Manual interruption: Your app can explicitly stop the agent by calling a REST API or client SDK method—typically triggered by a button tap or custom command.

This page describes how to implement agent interruption in your app.

Voice interruption

Conversational AI Engine supports an intelligent interruption feature that allows a user’s voice input to automatically interrupt the speaking agent. This enables quicker response times and more natural, fluid interactions.

By default, this feature is disabled. To enable it, set advanced_features.enable_aivad to true when calling Start a conversational AI agent.

To customize the agent's behavior when human voice interrupts the agent, configure the following parameters:

turn_detection.interrupt_mode: Defines how the agent responds when interrupted by human voice:
- interrupt: Immediately stop the current interaction and process the human voice input.
- append: Complete the current interaction, then process the human voice input.
- ignore: Discard the human voice input without processing or storing it in the conversation context.
turn_detection.interrupt_duration_ms: (Default 160) The minimum duration (in milliseconds) that the user's voice must exceed the Voice Activity Detection (VAD) threshold before triggering an interruption.

The following example shows how to do this:

curl --request post \
--url https://api.agora.io/api/conversational-ai-agent/v2/projects/:appid/join \
--header 'Authorization: Basic <your_base64_encoded_credentials>' \
--data '
{
    "name": "unique_name",
    "properties": {
        "channel": "channel_name",
        "token": "token",
        "agent_rtc_uid": "1001",
        "remote_rtc_uids": [
            "1002"
        ],
        "idle_timeout": 120,
        "advanced_features": {
            "enable_aivad": true
        },
        "turn_detection": {
            "interrupt_mode": "interrupt",
            "interrupt_duration_ms": 500
        },        
        "llm": {
            "url": "https://api.openai.com/v1/chat/completions",
            "api_key": "<your_llm_key>",
            "system_messages": [
                {
                    "role": "system",
                    "content": "You are a helpful chatbot."
                }
            ],
            "max_history": 32,
            "greeting_message": "Hello, how can I assist you today?",
            "failure_message": "Please hold on a second.",
            "params": {
                "model": "gpt-4o-mini"
            }
        },
        "tts": {
            "vendor": "microsoft",
            "params": {
                "key": "<your_tts_api_key>",
                "region": "eastus",
                "voice_name": "en-US-AndrewMultilingualNeural"
            }
        },
        "asr": {
            "language": "en-US"
        }
    }
}'

If the request succeeds, the API returns a 200 status code and a response body like the following:

{
  "agent_id": "1NT29X10YHxxxxxWJOXLYHNYB",
  "create_ts": 1737111452,
  "status": "RUNNING"
}

Manual interruption

Conversational AI Engine supports actively triggering an interruption by calling RESTful APIs or client component APIs. This allows users to interrupt the agent through a button click or a specific command.

Call the RESTful API

Use the Interrupt agent API to manually initiate an interruption request.

curl --request post \
--url https://api.agora.io/cn/api/conversational-ai-agent/v2/projects/:appid/agents/:agentId/interrupt \
--header 'Authorization: Basic <credentials>' \
--data '{}'

If the request is successful, the API returns a 200 status code and the following response body:

{
  "agent_id": "1NT29XxxxxxxxxELWEHC8OS",
  "channel": "test_channel",
  "start_ts": 1744877089
}

Call the client toolkit API

Agora provides a set of flexible, scalable and standardized client components for its conversational AI engine. These components support iOS, Android, and Web platforms and encapsulate scenario-based APIs. You can use them to integrate Agora Real-Time Communication (RTC) and Real-Time Messaging Signaling capabilities, enabling the following features:

Interrupt the agent
Display real-time subtitles
Receive event notifications
Optimize audio (Android and iOS only)

Before you begin, make sure you:

Integrate Video SDK v4.5.1 or later and follow the Quickstart guide to implement basic real-time audio and video features.
Enable the Signaling service for your project in the Agora Console and follow the Signaling Quickstart to implement real-time messaging.
Implement the basic logic to communicate with a Conversational AI agent.
Ensure that the RTC engine instance is initialized and Signaling is logged in. The toolkit does not handle initialization, lifecycle management, authentication, or login for Video SDK or Signaling.

Integrate the toolkit

Android
iOS
Web

Copy the convoaiApi folder to your project and import it before calling API methods. Refer to the component structure to understand the role of each file.

Copy the ConversationalAIAPI folder to your project and import it before calling API methods. Refer to the component structure to understand the role of each file.

Copy the conversational-ai-api file to your project and import it before calling API methods.. Refer to the component structure to understand the role of each file.

Initialize the component

Create a configuration object for the RTC engine and Signaling client instances, then use it to initialize the component instance.

Android
iOS
Web

// Create a configuration object for the RTC and RTM instances
val config = ConversationalAIAPIConfig(
    rtcEngine = rtcEngineInstance,
    rtmClient = rtmClientInstance,
    enableLog = true
)

// Create the component instance
val api = ConversationalAIAPIImpl(config)

// Create a configuration object for the RTC and RTM instances
let config = ConversationalAIAPIConfig(
    rtcEngine: rtcEngine, 
    rtmEngine: rtmEngine,
    enableLog: true
)

// Create the component instance
convoAIAPI = ConversationalAIAPIImpl(config: config)

// Create a configuration object for the RTC and RTM instances
ConversationalAIAPI.init({
    rtcEngine,
    rtmEngine,
})

// Get the API instance (singleton)
const conversationalAIAPI = ConversationalAIAPI.getInstance()

Configure the conversational AI agent

Call Start a conversational AI agent using the following parameter settings:

advanced_features.enable_rtm: true: Start the Signaling service (Required)
parameters.data_channel: "rtm": Enable the RTM data transmission channel (Required)
parameters.enable_metrics: true: Receive agent performance data (Enabled on demand)
parameters.enable_error_message: true: Receive agent error events (Enable on demand)

After the call is successful, the agent joins the specified RTC channel and the user can start interacting with the agent.

Interrupt the agent

Call the interrupt method to interrupt the agent.

Android
iOS
Web

api.interrupt("agentId") { error -> /* ... */ }

convoAIAPI.interrupt(agentUserId: "(agentUid)") { error in
    if let error = error {
        print("Interruption failed: (error.message)")
    } else {
        print("Interruption succeeded")
    }
}

await conversationalAIAPI.interrupt(`${agent_rtc_uid}`)

Destroy the component

When the agent interaction ends, destroy the component instance to release all resources.

Android
iOS
Web

api.destroy()

convoAIAPI.destroy()

conversationalAIAPI.destroy()

Copy only the following files and folders to integrate the client component. You do not need to copy other files.

Android
iOS
Web

IConversationalAIAPI.kt: API interface and related data structures and enumerations
ConversationalAIAPIImpl.kt: ConversationalAI API main implementation logic
ConversationalAIUtils.kt: Tool functions and event callback management subRender/
- v3/: Subtitle module
  - TranscriptionController.kt: Subtitle controller
  - MessageParser.kt: Message parser

ConversationalAIAPI.swift: API interface and related data structures and enumerations
ConversationalAIAPIImpl.swift: ConversationalAI API main implementation logic
Transcription/
- TranscriptionController.swift: Subtitle controller

index.ts: API Class
type.ts: API interface and related data structures and enumerations
utils/index.ts: API utility functions
utils/events.ts: Event management class, which can be extended to easily implement event monitoring and broadcasting
utils/sub-render.ts: Subtitle module

Android
iOS
Web

Interrupt agent

Voice interruption

Manual interruption

Call the RESTful API

Call the client toolkit API

Integrate the toolkit

Initialize the component

Configure the conversational AI agent

Interrupt the agent

Destroy the component

Reference

Sample project

Component structure

API reference

RESTful API

Toolkit

Was this helpful?