Skip to main content

Architecture

TEN Agent uses a modular architecture that connects multiple components through real-time messaging. This page describes the system components and how they interact to enable multimodal AI conversations.

Understand the tech

The architecture follows a distributed design with three main layers:

  1. Client layer: Frontend applications that users interact with
  2. Server layer: Web server and development server for system orchestration
  3. Extension layer: Modular components that provide AI capabilities

The system uses WebRTC for real-time audio/video communication and HTTP for control commands. Extensions communicate through a message-passing framework, allowing flexible configuration of processing pipelines.

Architecture Flow

Core components

TEN Agent consists of the following core components:

Frontend App (Web/Native): The user interface that provides access to TEN Agent. It includes:

  • HTTP client for sending control commands
  • RTC Client SDK for real-time audio/video communication
  • Configuration interface for managing agents and extensions

Web Server: A Go-based server that manages the system lifecycle. It handles:

  • HTTP requests from frontend clients
  • Agent process management (start/stop)
  • Configuration parameters including graph selection
  • Coordination between frontend and backend components

TEN Agent app: Contains the core agent runtime and extension orchestration. It manages:

  • Extension lifecycle and initialization
  • Message routing between extensions
  • Graph-based configuration execution
  • Real-time data flow coordination

Extension architecture

Extensions are grouped into functional processing units:

Standard Pipeline:

  • ASR Extension: Converts speech to text
  • LLM Extension: Processes text and generates responses
  • TTS Extension: Converts text back to speech
  • RTC Extension: Handles real-time communication

Realtime API Pipeline:

  • Realtime API Extension: Direct integration with services like OpenAI Realtime
  • RTC Extension: Manages audio/video transport
  • TTS Extension: Provides fallback speech synthesis

Data flow

  1. User Input: Audio/video/text enters through the frontend app
  2. Transport: RTC Network carries real-time data using WebRTC protocols
  3. Processing: Extensions process data according to the configured graph
  4. Response: Results flow back through the same channels to the user

Communication protocols

TEN Agent employs a dual-protocol approach that ensures reliable control alongside high-performance real-time communication. Control flow operates separately through HTTP, allowing start/stop commands and configuration changes without interrupting real-time streams.

HTTP is used for:

  • Agent lifecycle management
  • Configuration updates
  • Status monitoring

WebRTC protocol is used for:

  • Real-time audio streaming
  • Video transmission
  • Low-latency signaling