Skip to main content

Core concepts

TEN Agent is a conversational AI agent powered by the TEN framework. It integrates large language models (LLMs) such as Gemini and OpenAI with real-time audio and video capabilities, enabling multimodal human-AI interaction. It is fully compatible with popular workflow platforms such as Dify and Coze. This document introduces the core concepts related to TEN Agent and provides an overview of its architecture and components.

Project structure

The TEN Agent project is organized into the following major components> Each component serves a specific purpose in the ecosystem:

  • Agents: The core of the project, containing runtime logic, binaries, and agent examples. Within this folder, the ten_packages subfolder provides a collection of ready-to-use extensions that developers can leverage to build customized agents for specific tasks.

  • Dev Server: A backend service responsible for orchestrating agents and managing extension lifecycles.

  • Web Server: The frontend HTTP server running on port 8080, handling client requests and serving the user interface.

  • Extensions: Modular integrations that connect TEN Agent to external services including LLMs, TTS/STT providers, and custom APIs. These components enable flexible customization of agent capabilities.

  • Playground: An interactive web interface for configuring agents, selecting extensions, and testing functionality in real-time.

  • Demo: A deployment-ready environment showcasing complete agent implementations and deployment patterns for real-world use cases.

Docker containers

TEN Agent uses three Docker containers:

ten_agent_dev: The main development container that powers TEN Agent. It contains the core runtime environment, development tools, and dependencies needed to build and run agents. You can execute commands like task use to build agents and task run to start the web server.

ten_agent_playground: Runs on port 3000 and serves the web frontend interface. It provides an interactive environment where you can configure modules, select extensions, and test agents. The playground UI allows you to visually select graph types (Voice Agent or Realtime Agent), choose modules, and configure API settings.

ten_agent_demo: Runs on port 3002 and provides a production-ready sample setup. It demonstrates how you can deploy configured agents in real-world scenarios with all necessary components packaged together.

Agents

The Agents folder provides components you use to build agents tailored to specific applications. It contains:

  • Core binaries and examples that define agent behaviors
  • Scripts and outputs for flexible configurations
  • Tools to create, modify, and enhance AI agents
  • The ten_packages subfolder with pre-built extensions

Demo

The Demo folder provides a deployment-ready environment for showcasing TEN Agent. It includes:

  • Sample configurations for running agents in production
  • Pre-built agents and workflows that demonstrate framework capabilities
  • Tools for showcasing real-world applications

Playground

The Playground enables you to:

  • Select and configure extensions from pre-built modules
  • Experiment with different AI models, TTS/STT systems, and real-time communication tools
  • Test agent behaviors in an interactive environment
  • Configure agents without writing code

The Playground serves as your testing and configuration hub for exploring and fine-tuning AI systems.

Extensions

Extensions are modular components that add specific capabilities to TEN Agent:

  • LLM extensions: Connect to language models like OpenAI, Gemini, or Claude
  • STT extensions: Speech-to-text services like Whisper or Deepgram
  • TTS extensions: Text-to-speech services like ElevenLabs or Azure
  • Communication extensions: Real-time protocols like Agora RTC
  • Tool extensions: Custom APIs, weather services, or databases

Graphs

Graphs define how components connect and interact:

  • Specify which extensions to use
  • Define data flow between components
  • Configure agent behavior without code changes
  • Enable different agent types (Voice Agent, Realtime Agent, Text Agent)

Each graph is a blueprint that tells TEN Agent how to wire extensions together for specific use cases.