Voice assistant
This guide shows you how to configure voice assistants in the TEN Agent playground. You can build assistants using traditional speech recognition, language processing, and speech synthesis (STT + LLM + TTS) pipelines or modern real-time voice-to-voice (V2V) models.
Prerequisites
Before starting, ensure you have:
- TEN Agent playground running. Refer to the Agent quickstart.
- Agora RTC credentials from Agora Console
- API keys for your chosen services
- For traditional pipeline:
- STT service like Deepgram
- LLM like OpenAI
- TTS service like Fish.Audio
- For real-time V2V:
- Realtime API key from your V2V provider
- For traditional pipeline:
Traditional voice assistant
This configuration uses separate services for speech recognition, language processing, and speech synthesis (STT + LLM + TTS).
Configuration steps
Follow these steps to set up your traditional voice assistant pipeline:
-
Open the playground at
http://localhost:3000
-
Select the
voice_assistant
graph type -
Configure modules:
- Click Module Picker
- Select your preferred STT, LLM, and TTS modules
- Click Save Changes
-
Configure properties:
- Click the settings button next to the graph selector
- Enter API keys and settings for each module
- Click Save Changes
-
Connect and test:
- Click Connect to start the assistant
- Wait for the initialization to complete
- Begin speaking to interact with the agent
Azure STT integration
To use Azure STT integrated within the RTC extension module:
- Select the
voice_assistant_integrated_stt
graph type - Configure Azure credentials in the RTC module properties
- Follow the remaining steps as above
Real-time voice assistant
Modern voice-to-voice (V2V) models provide lower latency and more natural interactions.
Configuration steps
Set up your real-time voice assistant with these steps:
- Open the playground at
http://localhost:3000
- Select the
voice_assistant_realtime
graph type - Configure the V2V module:
- Click Module Picker
- Select your V2V provider
- Click Save Changes
- Add API credentials:
- Click the settings button
- Enter your Realtime API key
- Click Save Changes
- Connect and interact:
- Click Connect to start
- Speak naturally with the assistant
Add tool capabilities
Follow these steps to enhance your assistant with tools:
- Open Module Picker while your agent is running
- Find your LLM or V2V module
- Click the tool button next to the module
- Select a tool. For example, choose the Weather Tool from the list
- Click Save Changes
Your assistant can now answer weather-related questions. Try asking "What's the weather in London?"
Reference
This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.
Best practices
Follow these guidelines for optimal performance:
- Test each module individually before combining them
- Choose models based on your use case:
- For low latency, use real-time V2V models
- For high accuracy, use traditional STT + LLM + TTS pipeline
- Monitor API usage to control costs
- Configure appropriate timeouts for each service
Troubleshooting
Common issues and solutions:
Issue | Solution |
---|---|
No audio input |
|
High latency |
|
Transcription errors |
|