Skip to main content

Voice assistant

This guide shows you how to configure voice assistants in the TEN Agent playground. You can build assistants using traditional speech recognition, language processing, and speech synthesis (STT + LLM + TTS) pipelines or modern real-time voice-to-voice (V2V) models.

Prerequisites

Before starting, ensure you have:

  • TEN Agent playground running. Refer to the Agent quickstart.
  • Agora RTC credentials from Agora Console
  • API keys for your chosen services
    • For traditional pipeline:
    • For real-time V2V:
      • Realtime API key from your V2V provider

Traditional voice assistant

This configuration uses separate services for speech recognition, language processing, and speech synthesis (STT + LLM + TTS).

Configuration steps

Follow these steps to set up your traditional voice assistant pipeline:

  1. Open the playground at http://localhost:3000

  2. Select the voice_assistant graph type

  3. Configure modules:

    1. Click Module Picker
    2. Select your preferred STT, LLM, and TTS modules
    3. Click Save Changes
  4. Configure properties:

    1. Click the settings button next to the graph selector
    2. Enter API keys and settings for each module
    3. Click Save Changes
  5. Connect and test:

    1. Click Connect to start the assistant
    2. Wait for the initialization to complete
    3. Begin speaking to interact with the agent

Azure STT integration

To use Azure STT integrated within the RTC extension module:

  1. Select the voice_assistant_integrated_stt graph type
  2. Configure Azure credentials in the RTC module properties
  3. Follow the remaining steps as above

Real-time voice assistant

Modern voice-to-voice (V2V) models provide lower latency and more natural interactions.

Configuration steps

Set up your real-time voice assistant with these steps:

  1. Open the playground at http://localhost:3000
  2. Select the voice_assistant_realtime graph type
  3. Configure the V2V module:
    1. Click Module Picker
    2. Select your V2V provider
    3. Click Save Changes
  4. Add API credentials:
    1. Click the settings button
    2. Enter your Realtime API key
    3. Click Save Changes
  5. Connect and interact:
    1. Click Connect to start
    2. Speak naturally with the assistant

Add tool capabilities

Follow these steps to enhance your assistant with tools:

  1. Open Module Picker while your agent is running
  2. Find your LLM or V2V module
  3. Click the tool button next to the module
  4. Select a tool. For example, choose the Weather Tool from the list
  5. Click Save Changes

Your assistant can now answer weather-related questions. Try asking "What's the weather in London?"

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

Best practices

Follow these guidelines for optimal performance:

  • Test each module individually before combining them
  • Choose models based on your use case:
  • For low latency, use real-time V2V models
  • For high accuracy, use traditional STT + LLM + TTS pipeline
  • Monitor API usage to control costs
  • Configure appropriate timeouts for each service

Troubleshooting

Common issues and solutions:

IssueSolution
No audio input
  • Check browser microphone permissions
  • Verify RTC credentials are correct
  • Ensure microphone is not muted
High latency
  • Switch to V2V model for lower latency
  • Choose geographically closer API endpoints
  • Reduce model complexity settings
Transcription errors
  • Speak clearly and at moderate pace
  • Reduce background noise
  • Try different STT providers