Optimize audio

In real-time audio interactions, the rhythm, continuity, and intonation of conversations between humans and AI often differ from those between humans. To improve the AI–human conversation experience, it's important to optimize audio settings.

When using the Android or iOS Video/Voice SDK with the Conversational AI Engine, follow the best practices in this guide to improve conversation fluency and reliability, especially in complex network environments.

Server configuration

When calling the server API to create a Conversational AI agent, use the default values for audio-related parameters to ensure the best audio experience.

Client configuration

To configure the client app, implement the following:

Integrate the required dynamic libraries

For the best Conversational AI Engine audio experience, integrate and load the following dynamic libraries in your project:

Android
iOS

AI Denoising Plugin: libagora_ai_noise_suppression_extension.so
AI echo cancellation plug-in: libagora_ai_echo_cancellation_extension.so

For integration details, refer to App size optimization.

AI Denoising Plugin: AgoraAiNoiseSuppressionExtension.xcframework
AI echo cancellation plug-in: AgoraAiEchoCancellationExtension.xcframework

For integration details, refer to App size optimization.

Optimize audio for optimal performance

You can optimize audio settings in the following ways:

(Recommended) Use the Toolkit APIs

Supported in Video/Voice SDK version 4.5.1 and above.
Use the Video/Voice SDK APIs directly

Supported in SDK version 4.3.1 and above.

Use the toolkit APIs

In this solution, you use the toolkit APIs to optimize audio settings.

Android
iOS

Integrate the toolkit

Copy the convoaiApi folder to your project and import the toolkit before calling the toolkit API. Refer to Folder structure to understand the role of each file.

Create a toolkit instance

Create a configuration object with the Video SDK and Signaling engine instances. Use the configuration to create a toolkit instance.

// Create a configuration object for the Video SDK and Chat SDK instances val config = ConversationalAIAPIConfig(     rtcEngine = rtcEngineInstance,     rtmClient = rtmClientInstance,     enableLog = true ) // Create the toolkit instance val api = ConversationalAIAPIImpl(config)

Set optimal audio settings

Before joining the Video SDK channel, call the loadAudioSettings() method to apply the optimal audio parameters.

The component monitors audio route changes internally. If the audio route changes, it automatically calls this method again to reset the optimal parameters.
```
api.loadAudioSettings()rtcEngine.joinChannel(token, channelName, null, userId)
```

Add a Conversational AI agent to the channel

To start a Conversational AI agent, configure the following parameters in your POST request:

Parameter	Description	Required
`advanced_features.enable_rtm: true`	Starts the Signaling service	Yes
`parameters.data_channel: "rtm"`	Enables Signaling as the data transmission channel	Yes
`parameters.enable_metrics: true`	Enables agent performance data collection	Optional
`parameters.enable_error_message: true`	Enables reporting of agent error events	Optional

After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.

Release resources

At the end of each call, use the destroy method to clean up the cache.
```
api.destroy()
```

Integrate the toolkit

Copy the ConversationalAIAPI folder to your project and import the toolkit before calling the toolkit APIs. Refer to Folder structure to understand the role of each file.

Create a toolkit instance

Create a configuration object with the Video SDK and Signaling engine instances, then use the configuration to create a toolkit instance.

// Create configuration objects for the Video SDK and Chat SDK instances let config = ConversationalAIAPIConfig(     rtcEngine: rtcEngine,      rtmEngine: rtmEngine,     enableLog: true ) /// Create component instance convoAIAPI = ConversationalAIAPIImpl(config: config)

Set optimal audio settings

Before joining the Video SDK channel, call the loadAudioSettings() method to apply the optimal audio parameters.

The component monitors audio route changes internally. If the audio route changes, it automatically calls this method again to reset the optimal parameters.
```
convoAIAPI.loadAudioSettings()rtcEngine.joinChannel(rtcToken: token, channelName: channelName, uid: uid, isIndependent: independent)
```

Agent joins the channel

To start a Conversational AI agent, configure the following parameters in your POST request:

Parameter	Required	Description
`advanced_features.enable_rtm`	Yes	Enables the Chat service.
`parameters.data_channel`	Yes	Sets the data transmission channel to `"rtm"`.
`parameters.enable_metrics`	Optional	Enables collection of agent performance data.
`parameters.enable_error_message`	Optional	Enables reporting of agent error events.

After a successful request, the agent joins the specified Video SDK channel and the user can begin interacting with it.

Release resources

At the end of each call, use the destroy method to clean up the cache.
```
convoAIAPI.destroy()
```

Use the SDK APIs

In this solution, you use the Voice/Video SDK to optimize audio settings.

Set audio parameters

The settings in this section apply to Video/Voice SDK versions 4.3.1 and above. If you are using an earlier version, upgrade to version 4.5.1 or above or Contact Technical Support.

Android
iOS

For the best conversational AI audio experience, apply the following settings:

Set the audio scenario: When initializing the engine, set the audio scenario to the AI client scenario. You can also set the scenario before joining a channel by calling the setAudioScenario method.
Configure audio parameters: Call setParameters before joining a channel and whenever the onAudioRouteChanged callback is triggered. This configuration sets audio 3A plug-ins (acoustic echo cancellation, noise suppression, and automatic gain control), the audio sampling rate, the audio processing mode, and other settings. For recommended parameter values, refer to the sample code.

info

Since Video/Voice SDK versions 4.3.1 to 4.5.0 do not support the AI client audio scenario, set the scenario to AUDIO_SCENARIO_CHORUS to improve the audio experience. However, the audio experience cannot be aligned with versions 4.5.1 and above. To get the best audio experience, upgrade the SDK to version 4.5.1 or higher.

The following sample code defines a setAudioConfigParameters function to configure audio parameters. Call this function before joining a channel and whenever the audio route changes.

private var rtcEngine: RtcEngineEx? = null
private var mAudioRouting = Constants.AUDIO_ROUTE_DEFAULT

// Set audio configuration parameters
private fun setAudioConfigParameters(routing: Int) {
    mAudioRouting = routing
    rtcEngine?.apply {
        setParameters("{"che.audio.aec.split_srate_for_48k":16000}")
        setParameters("{"che.audio.sf.enabled":true}")
        setParameters("{"che.audio.sf.stftType":6}")
        setParameters("{"che.audio.sf.ainlpLowLatencyFlag":1}")
        setParameters("{"che.audio.sf.ainsLowLatencyFlag":1}")
        setParameters("{"che.audio.sf.procChainMode":1}")
        setParameters("{"che.audio.sf.nlpDynamicMode":1}")

        if (routing == Constants.AUDIO_ROUTE_HEADSET // 0
            || routing == Constants.AUDIO_ROUTE_EARPIECE // 1
            || routing == Constants.AUDIO_ROUTE_HEADSETNOMIC // 2
            || routing == Constants.AUDIO_ROUTE_BLUETOOTH_DEVICE_HFP // 5
            || routing == Constants.AUDIO_ROUTE_BLUETOOTH_DEVICE_A2DP) { // 10
            setParameters("{"che.audio.sf.nlpAlgRoute":0}")
        } else {
            setParameters("{"che.audio.sf.nlpAlgRoute":1}")
        }
        
        setParameters("{"che.audio.sf.ainlpModelPref":10}")
        setParameters("{"che.audio.sf.nsngAlgRoute":12}")
        setParameters("{"che.audio.sf.ainsModelPref":10}")
        setParameters("{"che.audio.sf.nsngPredefAgg":11}")
        setParameters("{"che.audio.agc.enable":false}")
    }
}

// Create and initialize the RTC engine
fun createRtcEngine(rtcCallback: IRtcEngineEventHandler): RtcEngineEx {
    val config = RtcEngineConfig()
    config.mContext = AgentApp.instance()
    config.mAppId = ServerConfig.rtcAppId
    config.mChannelProfile = Constants.CHANNEL_PROFILE_LIVE_BROADCASTING
    // Set the audio scene to AI dialogue scene (supported by 4.5.1 and above)
    // Version 4.3.1 ~ 4.5.0 is set to chorus scene AUDIO_SCENARIO_CHORUS
    config.mAudioScenario = Constants.AUDIO_SCENARIO_AI_CLIENT
    // Register audio route change callback
    config.mEventHandler = object : IRtcEngineEventHandler() {
        override fun onAudioRouteChanged(routing: Int) {
            super.onAudioRouteChanged(routing)
            // Set audio related parameters
            setAudioConfigParameters(routing)
        }
    }
    try {
        rtcEngine = (RtcEngine.create(config) as RtcEngineEx).apply {
            // Load the audio plugin
            loadExtensionProvider("ai_echo_cancellation_extension")
            loadExtensionProvider("ai_noise_suppression_extension")
        }
    } catch (e: Exception) {
        Log.e("CovAgoraManager", "createRtcEngine error: $e")
    }
    return rtcEngine!!
}

// Join the channel
fun joinChannel(rtcToken: String, channelName: String, uid: Int, isIndependent: Boolean = false) {

    // Initialize audio configuration parameters
    setAudioConfigParameters(mAudioRouting)

    // Configure channel options and join the channel
    val options = ChannelMediaOptions()
    options.clientRoleType = CLIENT_ROLE_BROADCASTER
    options.publishMicrophoneTrack = true
    options.publishCameraTrack = false
    options.autoSubscribeAudio = true
    options.autoSubscribeVideo = false       
    val ret = rtcEngine?.joinChannel(rtcToken, channelName, uid, options)
}

For the best conversational AI audio experience, apply the following settings:

Set the audio scenario: When initializing the engine, set the audio scenario to the AI client scenario. You can also set the scenario before joining a channel by calling the setAudioScenario method.
Configure audio parameters: Call setParameters before joining a channel and whenever the rtcEngine:didAudioRouteChanged: callback is triggered. This configuration sets audio 3A plug-ins (acoustic echo cancellation, noise suppression, and automatic gain control), the audio sampling rate, the audio processing mode, and other settings. For recommended parameter values, refer to the sample code.

info

Since Video/Voice SDK versions 4.3.1 to 4.5.0 do not support the AI client audio scenario, set the scenario to AgoraAudioScenarioChorus to improve the audio experience. However, the audio experience cannot be aligned with versions 4.5.1 and above. To get the best audio experience, upgrade the SDK to version 4.5.1 or higher.

The following sample code defines a setAudioConfigParameters function to configure audio parameters. Call this function before joining a channel and whenever the audio route changes.

class RTCManager: NSObject {
    private var rtcEngine: AgoraRtcEngineKit!
    private var audioDumpEnabled: Bool = false
    private var audioRouting = AgoraAudioOutputRouting.default
    
    // Set audio related parameters
    private func setAudioConfigParameters(routing: AgoraAudioOutputRouting) {
        audioRouting = routing
        rtcEngine.setParameters("{"che.audio.aec.split_srate_for_48k":16000}")
        rtcEngine.setParameters("{"che.audio.sf.enabled":true}")
        rtcEngine.setParameters("{"che.audio.sf.stftType":6}")
        rtcEngine.setParameters("{"che.audio.sf.ainlpLowLatencyFlag":1}")
        rtcEngine.setParameters("{"che.audio.sf.ainsLowLatencyFlag":1}")
        rtcEngine.setParameters("{"che.audio.sf.procChainMode":1}")
        rtcEngine.setParameters("{"che.audio.sf.nlpDynamicMode":1}")
        if routing == .headset ||
            routing == .earpiece ||
            routing == .headsetNoMic ||
            routing == .bluetoothDeviceHfp ||
            routing == .bluetoothDeviceA2dp {
            rtcEngine.setParameters("{"che.audio.sf.nlpAlgRoute":0}")
        } else {
            rtcEngine.setParameters("{"che.audio.sf.nlpAlgRoute":1}")
        }
        rtcEngine.setParameters("{"che.audio.sf.ainlpModelPref":10}")
        rtcEngine.setParameters("{"che.audio.sf.nsngAlgRoute":12}")
        rtcEngine.setParameters("{"che.audio.sf.ainsModelPref":10}")
        rtcEngine.setParameters("{"che.audio.sf.nsngPredefAgg":11}")
        rtcEngine.setParameters("{"che.audio.agc.enable":false}")
    }
}

extension RTCManager: RTCManagerProtocol {
    
    func createRtcEngine(delegate: AgoraRtcEngineDelegate) -> AgoraRtcEngineKit {
        let config = AgoraRtcEngineConfig()
        config.appId = AppContext.shared.appId
        config.channelProfile = .liveBroadcasting
        // Set the audio scene to AI dialogue scene (supported by 4.5.1 and above)
        // Versions 4.3.1 ~ 4.5.0 support chorus scenes .chorus
        config.audioScenario = .aiClient
        rtcEngine = AgoraRtcEngineKit.sharedEngine(with: config, delegate: delegate)
        // Register audio route change callback
        rtcEngine.addDelegate(self)
        return rtcEngine
    }
    
    func joinChannel(rtcToken: String, channelName: String, uid: String) {
        
        // Initialize audio configuration parameters
        setAudioConfigParameters(routing: audioRouting)

        // Configure channel options and join the channel
        let options = AgoraRtcChannelMediaOptions()
        options.clientRoleType = .broadcaster
        options.publishMicrophoneTrack = true
        options.publishCameraTrack = false
        options.autoSubscribeAudio = true
        options.autoSubscribeVideo = false
        let ret = rtcEngine.joinChannel(byToken: rtcToken, channelId: channelName, uid: UInt(uid) ?? 0, mediaOptions: options)           
    }
}

// Implement the AgoraRtcEngineDelegate interface to handle audio route change callbacks
extension RTCManager: AgoraRtcEngineDelegate {
    public func rtcEngine(_ engine: AgoraRtcEngineKit, didAudioRouteChanged routing: AgoraAudioOutputRouting) {
        setAudioConfigParameters(routing: routing)
    }
}

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

Sample project

Refer to the following open-source sample code to set audio-related parameters.

Android
iOS

CovRtcManager.kt

RTCManager.swift

Folder structure

Android
iOS
Web

IConversationalAIAPI.kt: API interface and related data structures and enumerations
ConversationalAIAPIImpl.kt: ConversationalAI API main implementation logic
ConversationalAIUtils.kt: Tool functions and event callback management
subRender/
- v3/: Subtitle module
  - TranscriptionController.kt: Subtitle Controller
  - MessageParser.kt: Message Parser

ConversationalAIAPI.swift: API interface and related data structures and enumerations
ConversationalAIAPIImpl.swift: ConversationalAI API main implementation logic
Transcription/
- TranscriptionController.swift: Subtitle Controller

index.ts: API Class
type.ts: API interface and related data structures and enumerations
utils/
- index.ts: API utility functions
- events.ts: Event management class, which can be extended to easily implement event monitoring and broadcasting
- sub-render.ts: Subtitle module

API reference

Android
iOS

SDK
Toolkit
- loadAudioSettings
- destroy

SDK
Toolkit

Was this helpful?