Architecture

TEN Agent uses a modular architecture that connects multiple components through real-time messaging. This page describes the system components and how they interact to enable multimodal AI conversations.

Understand the tech

The architecture follows a distributed design with three main layers:

Client layer: Frontend applications that users interact with
Server layer: Web server and development server for system orchestration
Extension layer: Modular components that provide AI capabilities

The system uses WebRTC for real-time audio/video communication and HTTP for control commands. Extensions communicate through a message-passing framework, allowing flexible configuration of processing pipelines.

Architecture Flow

Core components

TEN Agent consists of the following core components:

Frontend App (Web/Native): The user interface that provides access to TEN Agent. It includes:

HTTP client for sending control commands
RTC Client SDK for real-time audio/video communication
Configuration interface for managing agents and extensions

Web Server: A Go-based server that manages the system lifecycle. It handles:

HTTP requests from frontend clients
Agent process management (start/stop)
Configuration parameters including graph selection
Coordination between frontend and backend components

TEN Agent app: Contains the core agent runtime and extension orchestration. It manages:

Extension lifecycle and initialization
Message routing between extensions
Graph-based configuration execution
Real-time data flow coordination

Extension architecture

Extensions are grouped into functional processing units:

Standard Pipeline:

ASR Extension: Converts speech to text
LLM Extension: Processes text and generates responses
TTS Extension: Converts text back to speech
RTC Extension: Handles real-time communication

Realtime API Pipeline:

Realtime API Extension: Direct integration with services like OpenAI Realtime
RTC Extension: Manages audio/video transport
TTS Extension: Provides fallback speech synthesis

Data flow

User Input: Audio/video/text enters through the frontend app
Transport: RTC Network carries real-time data using WebRTC protocols
Processing: Extensions process data according to the configured graph
Response: Results flow back through the same channels to the user

Communication protocols

TEN Agent employs a dual-protocol approach that ensures reliable control alongside high-performance real-time communication. Control flow operates separately through HTTP, allowing start/stop commands and configuration changes without interrupting real-time streams.

HTTP is used for:

Agent lifecycle management
Configuration updates
Status monitoring

WebRTC protocol is used for:

Real-time audio streaming
Video transmission
Low-latency signaling

Understand the tech​

Core components​

Extension architecture​

Data flow​

Communication protocols​

Was this helpful?

Understand the tech

Core components

Extension architecture

Data flow

Communication protocols