TEN Agent
TEN is an open-source framework for building real-time, multimodal conversational AI, developed by Agora and the TEN community. It integrates large language models, real-time AI services such as Gemini 2.0 Live and OpenAI Realtime, and RTC capabilities to enable agents that can see, hear, and speak. The framework seamlessly connects with popular AI workflow platforms like Dify and Coze.
Start building with
Product Features
Modular Extension Architecture
Build conversational AI with plug-and-play extensions for LLMs, speech-to-text, text-to-speech, and tools. Easily swap components without code changes to test different configurations.
Graph-Based Configuration
Define agent behavior through visual graphs that specify data flow between extensions. Configure once and deploy anywhere with the same orchestration across environments.
Real-time Voice Interaction
Enable natural voice conversations with low-latency processing. Support for both traditional STT+LLM+TTS pipelines and modern voice-to-voice models for responsive interactions.
Multi-Modal Capabilities
Process and generate text, audio, and images. Create rich experiences like storytellers that generate narrative illustrations or assistants that understand visual context.
Tool Integration
Extend AI capabilities with specialized tools for weather, image generation, search, and more. Build custom tools to access external services and APIs through a standardized interface.
Platform Compatibility
Seamlessly connect with popular AI workflow platforms like Dify and Coze. Build voice interfaces for existing chatbots or create specialized agents for your unique use cases.