Voice assistant

This guide shows you how to configure voice assistants in the TEN Agent playground. You can build assistants using traditional speech recognition, language processing, and speech synthesis (STT + LLM + TTS) pipelines or modern real-time voice-to-voice (V2V) models.

Prerequisites

Before starting, ensure you have:

TEN Agent playground running. Refer to the Agent quickstart.
Agora RTC credentials from Agora Console
API keys for your chosen services
- For traditional pipeline:
  - STT service like Deepgram
  - LLM like OpenAI
  - TTS service like Fish.Audio
- For real-time V2V:
  - Realtime API key from your V2V provider

Traditional voice assistant

This configuration uses separate services for speech recognition, language processing, and speech synthesis (STT + LLM + TTS).

Configuration steps

Follow these steps to set up your traditional voice assistant pipeline:

Open the playground at http://localhost:3000
Select the voice_assistant graph type
Configure modules:
1. Click Module Picker
2. Select your preferred STT, LLM, and TTS modules
3. Click Save Changes
Configure properties:
1. Click the settings button next to the graph selector
2. Enter API keys and settings for each module
3. Click Save Changes
Connect and test:
1. Click Connect to start the assistant
2. Wait for the initialization to complete
3. Begin speaking to interact with the agent

Azure STT integration

To use Azure STT integrated within the RTC extension module:

Select the voice_assistant_integrated_stt graph type
Configure Azure credentials in the RTC module properties
Follow the remaining steps as above

Real-time voice assistant

Modern voice-to-voice (V2V) models provide lower latency and more natural interactions.

Configuration steps

Set up your real-time voice assistant with these steps:

Open the playground at http://localhost:3000
Select the voice_assistant_realtime graph type
Configure the V2V module:
1. Click Module Picker
2. Select your V2V provider
3. Click Save Changes
Add API credentials:
1. Click the settings button
2. Enter your Realtime API key
3. Click Save Changes
Connect and interact:
1. Click Connect to start
2. Speak naturally with the assistant

Add tool capabilities

Follow these steps to enhance your assistant with tools:

Open Module Picker while your agent is running
Find your LLM or V2V module
Click the tool button next to the module
Select a tool. For example, choose the Weather Tool from the list
Click Save Changes

Your assistant can now answer weather-related questions. Try asking "What's the weather in London?"

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

Best practices

Follow these guidelines for optimal performance:

Test each module individually before combining them
Choose models based on your use case:
For low latency, use real-time V2V models
For high accuracy, use traditional STT + LLM + TTS pipeline
Monitor API usage to control costs
Configure appropriate timeouts for each service

Troubleshooting

Common issues and solutions:

Issue	Solution
No audio input	Check browser microphone permissions Verify RTC credentials are correct Ensure microphone is not muted
High latency	Switch to V2V model for lower latency Choose geographically closer API endpoints Reduce model complexity settings
Transcription errors	Speak clearly and at moderate pace Reduce background noise Try different STT providers

Was this helpful?