Architecture
TEN Agent uses a modular architecture that connects multiple components through real-time messaging. This page describes the system components and how they interact to enable multimodal AI conversations.
Understand the tech
The architecture follows a distributed design with three main layers:
- Client layer: Frontend applications that users interact with
- Server layer: Web server and development server for system orchestration
- Extension layer: Modular components that provide AI capabilities
The system uses WebRTC for real-time audio/video communication and HTTP for control commands. Extensions communicate through a message-passing framework, allowing flexible configuration of processing pipelines.
Core components
TEN Agent consists of the following core components:
Frontend App (Web/Native): The user interface that provides access to TEN Agent. It includes:
- HTTP client for sending control commands
- RTC Client SDK for real-time audio/video communication
- Configuration interface for managing agents and extensions
Web Server: A Go-based server that manages the system lifecycle. It handles:
- HTTP requests from frontend clients
- Agent process management (start/stop)
- Configuration parameters including graph selection
- Coordination between frontend and backend components
TEN Agent app: Contains the core agent runtime and extension orchestration. It manages:
- Extension lifecycle and initialization
- Message routing between extensions
- Graph-based configuration execution
- Real-time data flow coordination
Extension architecture
Extensions are grouped into functional processing units:
Standard Pipeline:
- ASR Extension: Converts speech to text
- LLM Extension: Processes text and generates responses
- TTS Extension: Converts text back to speech
- RTC Extension: Handles real-time communication
Realtime API Pipeline:
- Realtime API Extension: Direct integration with services like OpenAI Realtime
- RTC Extension: Manages audio/video transport
- TTS Extension: Provides fallback speech synthesis
Data flow
- User Input: Audio/video/text enters through the frontend app
- Transport: RTC Network carries real-time data using WebRTC protocols
- Processing: Extensions process data according to the configured graph
- Response: Results flow back through the same channels to the user
Communication protocols
TEN Agent employs a dual-protocol approach that ensures reliable control alongside high-performance real-time communication. Control flow operates separately through HTTP, allowing start/stop commands and configuration changes without interrupting real-time streams.
HTTP is used for:
- Agent lifecycle management
- Configuration updates
- Status monitoring
WebRTC protocol is used for:
- Real-time audio streaming
- Video transmission
- Low-latency signaling