Use a Local LLM with Ollama
TEN Agent supports any LLM that provides OpenAI-compatible APIs. This guide shows you how to run models locally using Ollama, eliminating the need for external API calls while maintaining the same interface.
Prerequisites
Before starting, ensure you have:
- Completed the TEN Agent quickstart
- Sufficient hardware for your chosen model:
- CPU: Modern multi-core processor
- RAM: 8GB minimum (16GB+ recommended for larger models)
- GPU: Optional but recommended for better performance
Implementation
Install Ollama
Begin by installing Ollama and downloading your preferred language model:
-
Download and install Ollama from ollama.com
-
Download your desired model using
ollama pull {model name}
. For example:
Start the Ollama server
Launch the Ollama server with network access enabled to allow Docker containers to connect:
Since TEN Agent runs in Docker, you'll need your machine's private IP address:
Configure TEN Agent
Connect TEN Agent to your local Ollama instance through the playground interface:
-
Open the playground at http://localhost:3000
-
Select the
voice_assistant
graph -
In Module Picker, Choose OpenAI as the LLM provider
-
In Settings, configure the llm extension as follows:
- Base URL:
http://<your-private-ip>:11434/v1/
- Model: Your downloaded model name. For example,
llama3.2
- API Key: Any non-empty string. Ollama doesn't require authentication.
- Prompt: Your system prompt. For example, "You're a helpful assistant."
- Base URL:
-
Save your configuration in the playground
Test the integration
Verify your setup by running a test conversation:
- Start a conversation to test the connection
- Monitor Ollama logs for requests
Reference
OpenAI compatibility
For more information on Ollama’s OpenAI compatibility, refer to the official blog post.
Troubleshooting
If you encounter issues while setting up Ollama with TEN Agent, refer to these common problems and their solutions:
Issue | Possible solutions |
---|---|
Connection refused |
|
Model not found |
|
Performance issues |
|