Skip to main content

Send picture messages

When interacting with an agent, you may need to upload images or send image messages from the client to help the agent better understand the user's intent. This page describes how to use the Conversational AI Engine toolkit to send image messages to the large language model from the app. This allows the LLM to automatically reference the image content in subsequent conversations with the agent, allowing it to generate responses that better meet user needs.

Understand the tech

Agora provides a flexible, scalable, and standardized conversational AI engine toolkit. The toolkit supports iOS, Android, and Web platforms, and encapsulates scenario-based APIs. You can use these APIs to integrate the capabilities of the Agora Signaling SDK and Video SDK to enable the following features:

Call the component's chat API to send a picture message, and listen to the onMessageReceiptUpdated callback to receive the picture message receipt.

Prerequisites

Before you begin, ensure the following:

  • You have implemented the Conversational AI Engine REST quickstart.
  • Your app integrates Video SDK v4.5.1 or later and includes the video quickstart implementation.
  • You have enabled Signaling in the Agora Console and completed Signaling quickstart for basic messaging.
  • You maintain active and authenticated RTC and Signaling instances that persist beyond the component's lifecycle. The toolkit does not manage the initialization, lifecycle, or authentication of RTC or Signaling.
info
  • The picture messaging feature is currently in Beta and free for a limited time.
  • Image processing depends on the capabilities of the integrated LLM. Ensure the LLM you connect to the Conversational AI Engine supports image input.

Implementation

This section explains how to send a picture message.

  1. Integrate the toolkit

    Copy the convoaiApi folder to your project and import the toolkit before calling the toolkit API. Refer to Folder structure to understand the role of each file.

  2. Create a toolkit instance

    Create a configuration object with the Video SDK and Signaling engine instances. Use the configuration to create a toolkit instance.

    // Create configuration objects for the RTC and RTM instances val config = ConversationalAIAPIConfig(     rtcEngine = rtcEngineInstance,     rtmClient = rtmClientInstance,     enableLog = true ) // Create component instance val api = ConversationalAIAPIImpl(config)
  3. Register callback

    Call the addHandler method to register your implementation of the callback.

    api.addHandler(covEventHandler)
  4. Subscribe to the channel

    Agent-related events are delivered through Signaling channel messages. To receive these events, call subscribeMessage before starting the agent session.

    api.subscribeMessage("channelName") { error ->     if (error != null) {         // Handle error     } }
  5. Add a Conversational AI agent to the channel

    To start a Conversational AI agent, configure the following parameters in your POST request:

    ParameterDescriptionRequired
    advanced_features.enable_rtm: trueStarts the Signaling serviceYes
    parameters.data_channel: "rtm"Enables Signaling as the data transmission channelYes
    parameters.enable_metrics: trueEnables agent performance data collectionOptional
    parameters.enable_error_message: trueEnables reporting of agent error eventsOptional

    After a successful response, the agent joins the specified Video SDK channel and is ready to interact with the user.

  6. Send an image

    Call the chat method to send a picture message. The following example sends a picture using a URL:

    val uuid = "unique-image-id-123" // Generate a unique image identifier val imageUrl = "https://example.com/image.jpg" // HTTP/HTTPS URL of the image api.chat("agentUserId", ImageMessage(uuid = uuid, imageUrl = imageUrl)) { error ->     if (error != null) {         Log.e("Chat", "Failed to send image: ${error.errorMessage}")     } else {         Log.d("Chat", "Image send request successful")     } }
    info

    The callback of the chat completion interface only indicates whether the sending request is successful, and does not reflect the actual processing status of the message.

  7. Handle image sending status

    The success of sending a picture message is confirmed by the picture message receipt callback onMessageReceiptUpdated. If sending fails, the agent error callback onAgentError is fired. The uuid value in the callback identifies the uploaded picture.

    • Image sent successfully

      When you receive the onMessageReceiptUpdateda callback, follow the steps below to parse the JSON message in the callback and obtain the image's uuid and status information to confirm that the image was sent successfully:

      override fun onMessageReceiptUpdated(agentUserId: String, receipt: MessageReceipt) {      if (receipt.chatMessageType == ChatMessageType.Image) {          try {              val json = JSONObject(receipt.message)              // Check if the uuid field is included              if (json.has("uuid")) {                  val receivedUuid = json.getString("uuid")                                    // If the uuid matches, the image was sent successfully                  if (receivedUuid == "your-sent-uuid") {                      Log.d("ImageSend", "Image sent successfully: $receivedUuid")                      // Update the UI to show the successful sending status                  }              }          } catch (e: Exception) {              Log.e("ImageSend", "Failed to parse message receipt: ${e.message}")          }      }  }
    • Image sending failed

      If you receive the onAgentError callback, follow the steps below to parse the JSON message in the callback to obtain the image's uuid and status information to confirm that the image sending failed:

      override fun onMessageError(agentUserId: String, error: MessageError) {      if (error.chatMessageType == ChatMessageType.Image) {          try {              val json = JSONObject(error.message)              // Check if it contains the "uuid" field              if (json.has("uuid")) {                  val failedUuid = json.getString("uuid")                  // If the uuid matches, this image failed to send                  if (failedUuid == "your-sent-uuid") {                      Log.e("ImageSend", "Image send failed: $failedUuid")                      // Update the UI to show the failed send status                  }              }          } catch (e: Exception) {              Log.e("ImageSend", "Failed to parse error message: ${e.message}")          }      }  }
  8. Unsubscribe from the channel

    After an agent session ends, unsubscribe from channel messages to release resources:

    api.unsubscribeMessage("channelName") { error ->
    if (error != null) {
    // Handle the error
    }
    }
  9. Release resources

    At the end of each call, use the destroy method to clean up the cache.

    api.destroy()

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

Folder structure

  • IConversationalAIAPI.kt: API interface and related data structures and enumerations
  • ConversationalAIAPIImpl.kt: ConversationalAI API main implementation logic
  • ConversationalAIUtils.kt: Tool functions and event callback management
  • subRender/
    • v3/: Subtitle module
      • TranscriptionController.kt: Subtitle Controller
      • MessageParser.kt: Message Parser

API Reference

This section provides API reference documentation for the subtitles module.