Send picture messages
When interacting with an agent, you may need to upload images or send image messages from the client to help the agent better understand the user's intent. This page describes how to use the Conversational AI Engine toolkit to send image messages to the large language model from the app. This allows the LLM to automatically reference the image content in subsequent conversations with the agent, allowing it to generate responses that better meet user needs.
Understand the tech
Agora provides a flexible, scalable, and standardized conversational AI engine toolkit. The toolkit supports iOS, Android, and Web platforms, and encapsulates scenario-based APIs. You can use these APIs to integrate the capabilities of the Agora Signaling SDK and Video SDK to enable the following features:
- Interrupt agents
- Display live subtitles
- Receive event notifications
- Set optimal audio parameters (iOS and Android only)
- Send picture messages
Call the component's chat
API to send a picture message, and listen to the onMessageReceiptUpdated
callback to receive the picture message receipt.
Prerequisites
Before you begin, ensure the following:
- You have implemented the Conversational AI Engine REST quickstart.
- Your app integrates Video SDK v4.5.1 or later and includes the video quickstart implementation.
- You have enabled Signaling in the Agora Console and completed Signaling quickstart for basic messaging.
- You maintain active and authenticated RTC and Signaling instances that persist beyond the component's lifecycle. The toolkit does not manage the initialization, lifecycle, or authentication of RTC or Signaling.
- The picture messaging feature is currently in Beta and free for a limited time.
- Image processing depends on the capabilities of the integrated LLM. Ensure the LLM you connect to the Conversational AI Engine supports image input.
Implementation
This section explains how to send a picture message.
- Android
- iOS
- Web
-
Integrate the toolkit
Copy the
convoaiApi
folder to your project and import the toolkit before calling the toolkit API. Refer to Folder structure to understand the role of each file. -
Create a toolkit instance
Create a configuration object with the Video SDK and Signaling engine instances. Use the configuration to create a toolkit instance.
// Create configuration objects for the RTC and RTM instances val config = ConversationalAIAPIConfig( rtcEngine = rtcEngineInstance, rtmClient = rtmClientInstance, enableLog = true ) // Create component instance val api = ConversationalAIAPIImpl(config)
-
Register callback
Call the
addHandler
method to register your implementation of the callback.api.addHandler(covEventHandler)
-
Subscribe to the channel
Agent-related events are delivered through Signaling channel messages. To receive these events, call
subscribeMessage
before starting the agent session.api.subscribeMessage("channelName") { error -> if (error != null) { // Handle error } }
-
Add a Conversational AI agent to the channel
To start a Conversational AI agent, configure the following parameters in your
POST
request:Parameter Description Required advanced_features.enable_rtm: true
Starts the Signaling service Yes parameters.data_channel: "rtm"
Enables Signaling as the data transmission channel Yes parameters.enable_metrics: true
Enables agent performance data collection Optional parameters.enable_error_message: true
Enables reporting of agent error events Optional After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.
-
Send an image
Call the
chat
method to send a picture message. The following example sends a picture using a URL:val uuid = "unique-image-id-123" // Generate a unique image identifier val imageUrl = "https://example.com/image.jpg" // HTTP/HTTPS URL of the image api.chat("agentUserId", ImageMessage(uuid = uuid, imageUrl = imageUrl)) { error -> if (error != null) { Log.e("Chat", "Failed to send image: ${error.errorMessage}") } else { Log.d("Chat", "Image send request successful") } }
infoThe callback of the
chat
completion interface only indicates whether the sending request is successful, and does not reflect the actual processing status of the message. -
Handle image sending status
The success of sending a picture message is confirmed by the picture message receipt callback
onMessageReceiptUpdated
. If sending fails, the agent error callbackonAgentError
is fired. Theuuid
value in the callback identifies the uploaded picture.-
Image sent successfully
When you receive the
onMessageReceiptUpdateda
callback, follow the steps below to parse the JSON message in the callback and obtain the image'suuid
and status information to confirm that the image was sent successfully:override fun onMessageReceiptUpdated(agentUserId: String, receipt: MessageReceipt) { if (receipt.chatMessageType == ChatMessageType.Image) { try { val json = JSONObject(receipt.message) // Check if the uuid field is included if (json.has("uuid")) { val receivedUuid = json.getString("uuid") // If the uuid matches, the image was sent successfully if (receivedUuid == "your-sent-uuid") { Log.d("ImageSend", "Image sent successfully: $receivedUuid") // Update the UI to show the successful sending status } } } catch (e: Exception) { Log.e("ImageSend", "Failed to parse message receipt: ${e.message}") } } }
-
Image sending failed
If you receive the
onAgentError
callback, follow the steps below to parse the JSON message in the callback to obtain the image'suuid
and status information to confirm that the image sending failed:override fun onMessageError(agentUserId: String, error: MessageError) { if (error.chatMessageType == ChatMessageType.Image) { try { val json = JSONObject(error.message) // Check if it contains the "uuid" field if (json.has("uuid")) { val failedUuid = json.getString("uuid") // If the uuid matches, this image failed to send if (failedUuid == "your-sent-uuid") { Log.e("ImageSend", "Image send failed: $failedUuid") // Update the UI to show the failed send status } } } catch (e: Exception) { Log.e("ImageSend", "Failed to parse error message: ${e.message}") } } }
-
-
Unsubscribe from the channel
After an agent session ends, unsubscribe from channel messages to release resources:
api.unsubscribeMessage("channelName") { error ->
if (error != null) {
// Handle the error
}
} -
Release resources
At the end of each call, use the
destroy
method to clean up the cache.api.destroy()
-
Integrate the toolkit
Copy the
ConversationalAIAPI
folder to your project and import the toolkit before calling the toolkit APIs. Refer to Folder structure to understand the role of each file. -
Create a toolkit instance
Create a configuration object with the Video SDK and Signaling engine instances. Set the subtitle rendering mode, then use the configuration to create a toolkit instance.
// Create a configuration object for the RTC and RTM instances let config = ConversationalAIAPIConfig( rtcEngine: rtcEngine, rtmEngine: rtmEngine, enableLog: true ) // Create the component instance convoAIAPI = ConversationalAIAPIImpl(config: config)
-
Subscribe to the channel
Subtitles are delivered through Signaling channel messages. To receive subtitle data, call
subscribeMessage
before starting the agent session.convoAIAPI.subscribeMessage(channelName: channelName) { error in if let error = error { print("Subscription failed: \(error.message)") } else { print("Subscription successful") } }
-
Register callback
Call the
addHandler
method to register your implementation of the callback.convoAIAPI.addHandler(handler: self)
-
Add a Conversational AI agent to the channel
To start a Conversational AI agent, configure the following parameters in your
POST
request:Parameter Description Required advanced_features.enable_rtm: true
Starts the Signaling service Yes parameters.data_channel: "rtm"
Enables Signaling as the data transmission channel Yes parameters.enable_metrics: true
Enables agent performance data collection Optional parameters.enable_error_message: true
Enables reporting of agent error events Optional After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.
-
Send an image
Call the
chat
method to send a picture message. The following example sends a picture using a URL:let uuid = UUID().uuidString let imageUrl = "https://example.com/image.jpg" let message = ImageMessage(uuid: uuid, url: imageUrl) self.convoAIAPI.chat(agentUserId: "(agentUid)", message: message) { [weak self] error in if let error = error { print("send image failed, error: \(error.message)") } else { print("send image success") } }
infoThe callback of the
chat
completion interface only indicates whether the sending request is successful, and does not reflect the actual processing status of the message. -
Receive image sending response
The success of sending a picture message is confirmed by the picture message receipt callback
onMessageReceiptUpdated
. If sending fails, the agent error callbackonAgentError
is fired. Theuuid
value in the callback identifies the uploaded picture.-
Image sent successfully
When you receive the
onMessageReceiptUpdateda
callback, follow the steps below to parse the JSON message in the callback and obtain the image'suuid
and status information to confirm that the image was sent successfully:struct PictureInfo: Codable { let uuid: String } public func onMessageReceiptUpdated(agentUserId: String, messageReceipt: MessageReceipt) { // Check if the message type is Context if messageReceipt.type == .context { guard let messageData = messageReceipt.message.data(using: .utf8) else { return } // Parse receipt.message as a JSON object do { let imageInfo = try JSONDecoder().decode(PictureInfo.self, from: messageData) // Check if the uuid field exists let uuid = imageInfo.uuid // Update UI to show that the image was sent successfully self.messageView.viewModel.updateImageMessage(uuid: uuid, state: .success) } catch { print("Failed to decode PictureInfo: \(error)") } print("Failed to parse message string from image info message") return } }
-
Image sending failed
If you receive the
onAgentError
callback, follow the steps below to parse the JSON message in the callback to obtain the image'suuid
and status information to confirm that the image sending failed:struct ImageUploadError: Codable { let code: Int let message: String } struct ImageUploadErrorResponse: Codable { let uuid: String let success: Bool let error: ImageUploadError? } public func onMessageError(agentUserId: String, error: MessageError) { if let messageData = error.message.data(using: .utf8) { do { let errorResponse = try JSONDecoder().decode(ImageUploadErrorResponse.self, from: messageData) if !errorResponse.success { let errorMessage = errorResponse.error?.message ?? "Unknown error" let errorCode = errorResponse.error?.code ?? 0 addLog("<<< [ImageUploadError] Image upload failed: (errorMessage) (code: (errorCode))") // 更新 UI 显示发送失败状态 DispatchQueue.main.async { [weak self] in self?.messageView.viewModel.updateImageMessage(uuid: errorResponse.uuid, state: .failed) } } } catch { addLog("<<< [onMessageError] Failed to parse error message JSON: (error)") } } }
-
-
Unsubscribe from the channel
After each agent session ends, unsubscribe from channel messages to release subtitle-related resources.
// Unsubscribe from channel messages convoAIAPI.unsubscribeMessage(channelName: channelName) { error in if let error = error { print("Unsubscription failed: \(error.message)") } else { print("Unsubscribed successfully") } }
-
Release resources
At the end of each call, use the
destroy
method to clean up the cache.convoAIAPI.destroy()
-
Integrate the toolkit
Copy the
conversational-ai-api
file to your project and import the toolkit before calling its API. Refer to Folder structure to understand the role of each file. -
Create a toolkit instance
Before joining a Video SDK channel, create video and Signaling engine instances and pass in to the toolkit instance.
// Initialize the component ConversationalAIAPI.init({ rtcEngine, rtmEngine, }) // Get the API instance (singleton) const conversationalAIAPI = ConversationalAIAPI.getInstance()
-
Subscribe to the channel Agent-related events are delivered through Signaling messages. Before starting an agent session, call
subscribeMessage
to receive these events:conversationalAIAPI.subscribeMessage(channel_name)
-
Register callbacks
// Listen for message receipt updates
conversationalAIAPI.on(EConversationalAIAPIEvents.MESSAGE_RECEIPT_UPDATED, handleMessageReceiptUpdated)
// Listen for agent error events
conversationalAIAPI.on(EConversationalAIAPIEvents.AGENT_ERROR, onAgentError) -
Add a Conversational AI agent to the channel
To start a Conversational AI agent, configure the following parameters in your
POST
request:Parameter Description Required advanced_features.enable_rtm: true
Starts the Signaling service Yes parameters.data_channel: "rtm"
Enables Signaling as the data transmission channel Yes parameters.enable_metrics: true
Enables agent performance data collection Optional parameters.enable_error_message: true
Enables reporting of agent error events Optional After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.
-
Send an image
Call the
chat
method to send a picture message. The following example sends a picture using a URL:import { EChatMessageType } from '@/conversational-ai-api/type' // Send picture message await conversationalAIAPI.chat(`${agent_rtc_uid}`, { messageType: EChatMessageType.IMAGE, url: "https://example.com/image.jpg", uuid: genUUID() })
infoThe callback of the
chat
completion interface only indicates whether the sending request is successful, and does not reflect the actual processing status of the message. -
Receive image sending response
The success of sending a picture message is confirmed by the picture message receipt callback
onMessageReceiptUpdated
. If sending fails, the agent error callbackonAgentError
is fired. Theuuid
value in the callback identifies the uploaded picture.-
Image sent successfully
When you receive the
handleMessageReceiptUpdated
callback, follow the steps below to parse the JSON message in the callback and obtain the image'suuid
and status information to confirm that the image was sent successfully:import { TMessageReceipt, EModuleType, EConversationalAIAPIEvents } from '@/conversational-ai-api/type' // Handle the send status of an image message conversationalAIAPI.on(EConversationalAIAPIEvents.MESSAGE_RECEIPT_UPDATED, (agentUserId: string, messageReceipt: TMessageReceipt) => { // Check if the message type is Context if (messageReceipt.moduleType !== EModuleType.CONTEXT) { return } // Parse receipt.message as a JSON object try { const receiptMessage = JSON.parse(messageReceipt.message) // Check if the uuid field exists const uuid = receiptMessage.uuid if (!uuid) { return } // Announce that the message was sent successfully console.log(`Message sent successfully, UUID: ${uuid}`) // Replace this with actual voice feedback logic } catch (error) { console.error('Failed to parse message:', error) } })
-
Image sending failed
If you receive the
handleAgentError
callback, follow the steps below to parse the JSON message in the callback to obtain the image'suuid
and status information to confirm that the image sending failed:import { EConversationalAIAPIEvents, EChatMessageType } from '@/conversational-ai-api/type'; conversationalAIAPI.on(EConversationalAIAPIEvents.MESSAGE_ERROR, (agentUserId, error) => { console.error(`Message error for agent ${agentUserId}:`, error); if (error.type === EChatMessageType.IMAGE) { try { const errorData = JSON.parse(error.message); if (errorData?.uuid) { console.warn(`Image error for agent ${agentUserId} with UUID: ${errorData.uuid}`); } } catch (e) { console.error(`Failed to handle image error for agent ${agentUserId}:`, e); } } })
-
-
Unsubscribe from the channel
After each agent session ends, unsubscribe from channel messages to release resources associated with callback events:
conversationalAIAPI.unsubscribeMessage(channel_name)
-
Release resources
At the end of each call, use the
destroy
method to clean up the cache.conversationalAIAPI.destroy()
Reference
This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.
Folder structure
- Android
- iOS
- Web
IConversationalAIAPI.kt
: API interface and related data structures and enumerationsConversationalAIAPIImpl.kt
: ConversationalAI API main implementation logicConversationalAIUtils.kt
: Tool functions and event callback managementsubRender/
v3/
: Subtitle moduleTranscriptionController.kt
: Subtitle ControllerMessageParser.kt
: Message Parser
ConversationalAIAPI.swift
: API interface and related data structures and enumerationsConversationalAIAPIImpl.swift
: ConversationalAI API main implementation logicTranscription/
TranscriptionController.swift
: Subtitle Controller
index.ts
: API Classtype.ts
: API interface and related data structures and enumerationsutils/
index.ts
: API utility functionsevents.ts
: Event management class, which can be extended to easily implement event monitoring and broadcastingsub-render.ts
: Subtitle module
API Reference
This section provides API reference documentation for the subtitles module.
- Android
- iOS
- Web