Send picture messages

When interacting with an agent, you may need to upload images or send image messages from the client to help the agent better understand the user's intent. This page describes how to use the Conversational AI Engine toolkit to send image messages to the large language model from the app. This allows the LLM to automatically reference the image content in subsequent conversations with the agent, allowing it to generate responses that better meet user needs.

Understand the tech

Agora provides a flexible, scalable, and standardized conversational AI engine toolkit. The toolkit supports iOS, Android, and Web platforms, and encapsulates scenario-based APIs. You can use these APIs to integrate the capabilities of the Agora Signaling SDK and Video SDK to enable the following features:

Interrupt agents
Display live transcripts
Receive event notifications
Set optimal audio parameters (iOS and Android only)
Send picture messages

Call the component's chat API to send a picture message, and listen to the onMessageReceiptUpdated callback to receive the picture message receipt.

Prerequisites

Before you begin, ensure the following:

You have implemented the Conversational AI Engine REST quickstart.
Your app integrates Video SDK v4.5.1 or later and includes the video quickstart implementation.
You have enabled Signaling in the Agora Console and completed Signaling quickstart for basic messaging.
You maintain active and authenticated RTC and Signaling instances that persist beyond the component's lifecycle. The toolkit does not manage the initialization, lifecycle, or authentication of RTC or Signaling.

info

The picture messaging feature is currently in Beta and free for a limited time.
Image processing depends on the capabilities of the integrated LLM. Ensure the LLM you connect to the Conversational AI Engine supports image input.

Implementation

This section explains how to send a picture message.

Android
iOS
Web

Integrate the toolkit

Copy the convoaiApi folder to your project and import the toolkit before calling the toolkit API. Refer to Folder structure to understand the role of each file.

Create a toolkit instance

Create a configuration object with the Video SDK and Signaling engine instances. Use the configuration to create a toolkit instance.

// Create configuration objects for the RTC and RTM instances val config = ConversationalAIAPIConfig(     rtcEngine = rtcEngineInstance,     rtmClient = rtmClientInstance,     enableLog = true ) // Create component instance val api = ConversationalAIAPIImpl(config)

Register callback

Call the addHandler method to register your implementation of the callback.
```
api.addHandler(covEventHandler)
```
Subscribe to the channel

Agent-related events are delivered through Signaling channel messages. To receive these events, call subscribeMessage before starting the agent session.
```
api.subscribeMessage("channelName") { error ->     if (error != null) {         // Handle error     } }
```

Add a Conversational AI agent to the channel

To start a Conversational AI agent, configure the following parameters in your POST request:

Parameter	Description	Required
`advanced_features.enable_rtm: true`	Starts the Signaling service	Yes
`parameters.data_channel: "rtm"`	Enables Signaling as the data transmission channel	Yes
`parameters.enable_metrics: true`	Enables agent performance data collection	Optional
`parameters.enable_error_message: true`	Enables reporting of agent error events	Optional

After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.

Send an image

Call the chat method to send a picture message. The following example sends a picture using a URL:

val uuid = "unique-image-id-123" // Generate a unique image identifier val imageUrl = "https://example.com/image.jpg" // HTTP/HTTPS URL of the image api.chat("agentUserId", ImageMessage(uuid = uuid, imageUrl = imageUrl)) { error ->     if (error != null) {         Log.e("Chat", "Failed to send image: ${error.errorMessage}")     } else {         Log.d("Chat", "Image send request successful")     } }

info

The callback of the chat completion interface only indicates whether the sending request is successful, and does not reflect the actual processing status of the message.

Handle image sending status

The success of sending a picture message is confirmed by the picture message receipt callback onMessageReceiptUpdated. If sending fails, the agent error callback onAgentError is fired. The uuid value in the callback identifies the uploaded picture.

Image sent successfully

When you receive the onMessageReceiptUpdateda callback, follow the steps below to parse the JSON message in the callback and obtain the image's uuid and status information to confirm that the image was sent successfully:

override fun onMessageReceiptUpdated(agentUserId: String, receipt: MessageReceipt) {      if (receipt.chatMessageType == ChatMessageType.Image) {          try {              val json = JSONObject(receipt.message)              // Check if the uuid field is included              if (json.has("uuid")) {                  val receivedUuid = json.getString("uuid")                                    // If the uuid matches, the image was sent successfully                  if (receivedUuid == "your-sent-uuid") {                      Log.d("ImageSend", "Image sent successfully: $receivedUuid")                      // Update the UI to show the successful sending status                  }              }          } catch (e: Exception) {              Log.e("ImageSend", "Failed to parse message receipt: ${e.message}")          }      }  }

Image sending failed

If you receive the onAgentError callback, follow the steps below to parse the JSON message in the callback to obtain the image's uuid and status information to confirm that the image sending failed:

override fun onMessageError(agentUserId: String, error: MessageError) {      if (error.chatMessageType == ChatMessageType.Image) {          try {              val json = JSONObject(error.message)              // Check if it contains the "uuid" field              if (json.has("uuid")) {                  val failedUuid = json.getString("uuid")                  // If the uuid matches, this image failed to send                  if (failedUuid == "your-sent-uuid") {                      Log.e("ImageSend", "Image send failed: $failedUuid")                      // Update the UI to show the failed send status                  }              }          } catch (e: Exception) {              Log.e("ImageSend", "Failed to parse error message: ${e.message}")          }      }  }

Unsubscribe from the channel

After an agent session ends, unsubscribe from channel messages to release resources:

api.unsubscribeMessage("channelName") { error ->
     if (error != null) {
         // Handle the error
     }
 }

Release resources

At the end of each call, use the destroy method to clean up the cache.
```
api.destroy()
```

Integrate the toolkit

Copy the ConversationalAIAPI folder to your project and import the toolkit before calling the toolkit APIs. Refer to Folder structure to understand the role of each file.

Create a toolkit instance

Create a configuration object with the Video SDK and Signaling engine instances. Set the transcript rendering mode, then use the configuration to create a toolkit instance.

// Create a configuration object for the RTC and RTM instances let config = ConversationalAIAPIConfig(     rtcEngine: rtcEngine,      rtmEngine: rtmEngine,     enableLog: true ) // Create the component instance convoAIAPI = ConversationalAIAPIImpl(config: config)

Subscribe to the channel

transcripts are delivered through Signaling channel messages. To receive transcript data, call subscribeMessage before starting the agent session.

convoAIAPI.subscribeMessage(channelName: channelName) { error in     if let error = error {         print("Subscription failed: \(error.message)")     } else {         print("Subscription successful")     } }

Register callback

Call the addHandler method to register your implementation of the callback.
```
convoAIAPI.addHandler(handler: self)
```

Add a Conversational AI agent to the channel

To start a Conversational AI agent, configure the following parameters in your POST request:

Parameter	Description	Required
`advanced_features.enable_rtm: true`	Starts the Signaling service	Yes
`parameters.data_channel: "rtm"`	Enables Signaling as the data transmission channel	Yes
`parameters.enable_metrics: true`	Enables agent performance data collection	Optional
`parameters.enable_error_message: true`	Enables reporting of agent error events	Optional

After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.

Send an image

Call the chat method to send a picture message. The following example sends a picture using a URL:

let uuid = UUID().uuidString let imageUrl = "https://example.com/image.jpg" let message = ImageMessage(uuid: uuid, url: imageUrl) self.convoAIAPI.chat(agentUserId: "(agentUid)", message: message) { [weak self] error in     if let error = error {         print("send image failed, error: \(error.message)")     } else {         print("send image success")     } }

info

The callback of the chat completion interface only indicates whether the sending request is successful, and does not reflect the actual processing status of the message.

Receive image sending response

Image sent successfully

struct PictureInfo: Codable {      let uuid: String  }  public func onMessageReceiptUpdated(agentUserId: String, messageReceipt: MessageReceipt) {      // Check if the message type is Context      if messageReceipt.type == .context {          guard let messageData = messageReceipt.message.data(using: .utf8) else {              return          }          // Parse receipt.message as a JSON object          do {              let imageInfo = try JSONDecoder().decode(PictureInfo.self, from: messageData)              // Check if the uuid field exists              let uuid = imageInfo.uuid              // Update UI to show that the image was sent successfully              self.messageView.viewModel.updateImageMessage(uuid: uuid, state: .success)          } catch {              print("Failed to decode PictureInfo: \(error)")          }          print("Failed to parse message string from image info message")          return      }  }

Image sending failed

struct ImageUploadError: Codable {      let code: Int      let message: String  }  struct ImageUploadErrorResponse: Codable {      let uuid: String      let success: Bool      let error: ImageUploadError?  }  public func onMessageError(agentUserId: String, error: MessageError) {      if let messageData = error.message.data(using: .utf8) {          do {              let errorResponse = try JSONDecoder().decode(ImageUploadErrorResponse.self, from: messageData)              if !errorResponse.success {                  let errorMessage = errorResponse.error?.message ?? "Unknown error"                  let errorCode = errorResponse.error?.code ?? 0                                    addLog("<<< [ImageUploadError] Image upload failed: (errorMessage) (code: (errorCode))")                                    // 更新 UI 显示发送失败状态                  DispatchQueue.main.async { [weak self] in                      self?.messageView.viewModel.updateImageMessage(uuid: errorResponse.uuid, state: .failed)                  }              }          } catch {              addLog("<<< [onMessageError] Failed to parse error message JSON: (error)")          }      }  }

Unsubscribe from the channel

After each agent session ends, unsubscribe from channel messages to release transcript-related resources.

// Unsubscribe from channel messages convoAIAPI.unsubscribeMessage(channelName: channelName) { error in     if let error = error {         print("Unsubscription failed: \(error.message)")     } else {         print("Unsubscribed successfully")     } }

Release resources

At the end of each call, use the destroy method to clean up the cache.
```
convoAIAPI.destroy()
```

Integrate the toolkit

Copy the conversational-ai-api file to your project and import the toolkit before calling its API. Refer to Folder structure to understand the role of each file.

Create a toolkit instance

Before joining a Video SDK channel, create video and Signaling engine instances and pass in to the toolkit instance.

// Initialize the component ConversationalAIAPI.init({     rtcEngine,     rtmEngine, }) // Get the API instance (singleton) const conversationalAIAPI = ConversationalAIAPI.getInstance()

Subscribe to the channel Agent-related events are delivered through Signaling messages. Before starting an agent session, call subscribeMessage to receive these events:
```
conversationalAIAPI.subscribeMessage(channel_name)
```

Register callbacks

// Listen for message receipt updates
 conversationalAIAPI.on(EConversationalAIAPIEvents.MESSAGE_RECEIPT_UPDATED, handleMessageReceiptUpdated)
 // Listen for agent error events
 conversationalAIAPI.on(EConversationalAIAPIEvents.AGENT_ERROR, onAgentError)

Add a Conversational AI agent to the channel

To start a Conversational AI agent, configure the following parameters in your POST request:

Parameter	Description	Required
`advanced_features.enable_rtm: true`	Starts the Signaling service	Yes
`parameters.data_channel: "rtm"`	Enables Signaling as the data transmission channel	Yes
`parameters.enable_metrics: true`	Enables agent performance data collection	Optional
`parameters.enable_error_message: true`	Enables reporting of agent error events	Optional

After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.

Send an image

Call the chat method to send a picture message. The following example sends a picture using a URL:

import { EChatMessageType } from '@/conversational-ai-api/type' // Send picture message await conversationalAIAPI.chat(`${agent_rtc_uid}`, {         messageType: EChatMessageType.IMAGE,         url: "https://example.com/image.jpg",         uuid: genUUID()         })

info

The callback of the chat completion interface only indicates whether the sending request is successful, and does not reflect the actual processing status of the message.

Receive image sending response

Image sent successfully

When you receive the handleMessageReceiptUpdated callback, follow the steps below to parse the JSON message in the callback and obtain the image's uuid and status information to confirm that the image was sent successfully:

import { TMessageReceipt, EModuleType, EConversationalAIAPIEvents } from '@/conversational-ai-api/type'  // Handle the send status of an image message  conversationalAIAPI.on(EConversationalAIAPIEvents.MESSAGE_RECEIPT_UPDATED, (agentUserId: string,  messageReceipt: TMessageReceipt) => {      // Check if the message type is Context      if (messageReceipt.moduleType !== EModuleType.CONTEXT) {          return      }      // Parse receipt.message as a JSON object      try {          const receiptMessage = JSON.parse(messageReceipt.message)          // Check if the uuid field exists          const uuid = receiptMessage.uuid          if (!uuid) {              return          }          // Announce that the message was sent successfully          console.log(`Message sent successfully, UUID: ${uuid}`) // Replace this with actual voice feedback logic      } catch (error) {          console.error('Failed to parse message:', error)      }  })

Image sending failed

If you receive the handleAgentError callback, follow the steps below to parse the JSON message in the callback to obtain the image's uuid and status information to confirm that the image sending failed:

import { EConversationalAIAPIEvents, EChatMessageType } from '@/conversational-ai-api/type';  conversationalAIAPI.on(EConversationalAIAPIEvents.MESSAGE_ERROR, (agentUserId, error) => {  console.error(`Message error for agent ${agentUserId}:`, error);  if (error.type === EChatMessageType.IMAGE) {      try {          const errorData = JSON.parse(error.message);          if (errorData?.uuid) {              console.warn(`Image error for agent ${agentUserId} with UUID: ${errorData.uuid}`);          }      } catch (e) {          console.error(`Failed to handle image error for agent ${agentUserId}:`, e);      }  }  })

Unsubscribe from the channel

After each agent session ends, unsubscribe from channel messages to release resources associated with callback events:
```
conversationalAIAPI.unsubscribeMessage(channel_name)
```
Release resources

At the end of each call, use the destroy method to clean up the cache.
```
conversationalAIAPI.destroy()
```

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

Folder structure

Android
iOS
Web

IConversationalAIAPI.kt: API interface and related data structures and enumerations
ConversationalAIAPIImpl.kt: ConversationalAI API main implementation logic
ConversationalAIUtils.kt: Tool functions and event callback management
subRender/
- v3/: Transcription module
  - TranscriptionController.kt: Transcription Controller
  - MessageParser.kt: Message Parser

ConversationalAIAPI.swift: API interface and related data structures and enumerations
ConversationalAIAPIImpl.swift: ConversationalAI API main implementation logic
Transcription/
- TranscriptionController.swift: Transcription Controller

index.ts: API Class
type.ts: API interface and related data structures and enumerations
utils/
- index.ts: API utility functions
- events.ts: Event management class, which can be extended to easily implement event monitoring and broadcasting
- sub-render.ts: Transcription module

API Reference

This section provides API reference documentation for the transcript module.

Understand the tech​

Prerequisites​

Implementation​

Reference​

Folder structure​

API Reference​

Understand the tech

Prerequisites

Implementation

Reference

Folder structure

API Reference