Skip to main content

TEN Agent

TEN is an open-source framework for building real-time, multimodal conversational AI, developed by Agora and the TEN community. It integrates large language models, real-time AI services such as Gemini 2.0 Live and OpenAI Realtime, and RTC capabilities to enable agents that can see, hear, and speak. The framework seamlessly connects with popular AI workflow platforms like Dify and Coze.

Start building with

SDK quickstart

Customize your experience from the start with our flexible SDK.

Samples

Product Features

Modular Extension Architecture

Build conversational AI with plug-and-play extensions for LLMs, speech-to-text, text-to-speech, and tools. Easily swap components without code changes to test different configurations.

Graph-Based Configuration

Define agent behavior through visual graphs that specify data flow between extensions. Configure once and deploy anywhere with the same orchestration across environments.

Real-time Voice Interaction

Enable natural voice conversations with low-latency processing. Support for both traditional STT+LLM+TTS pipelines and modern voice-to-voice models for responsive interactions.

Multi-Modal Capabilities

Process and generate text, audio, and images. Create rich experiences like storytellers that generate narrative illustrations or assistants that understand visual context.

Tool Integration

Extend AI capabilities with specialized tools for weather, image generation, search, and more. Build custom tools to access external services and APIs through a standardized interface.

Platform Compatibility

Seamlessly connect with popular AI workflow platforms like Dify and Coze. Build voice interfaces for existing chatbots or create specialized agents for your unique use cases.