How to Use OpenClaw with Ollama — Local AI Models
Connect OpenClaw to Ollama for fully local, private AI inference. No API costs, no data leaving your machine. Covers model selection, configuration, and performance tuning.
Running OpenClaw with Ollama means your agent never sends data to external APIs. Complete privacy, zero API costs, fully offline capable. The tradeoff is speed and capability compared to frontier models, but for many use cases it's worth it.
Install Ollama
On Mac:
brew install ollamaOn Linux:
curl -fsSL https://ollama.ai/install.sh | shStart the Ollama server:
ollama serveDownload a Model
ollama pull llama3.1:8b
# Or for lower RAM machines:
ollama pull phi3:mini
# Or for best quality (needs 32GB+ RAM):
ollama pull llama3.1:70bConnect OpenClaw to Ollama
openclaw config set llm.provider ollama
openclaw config set llm.baseUrl http://localhost:11434
openclaw config set llm.model llama3.1:8bTest the connection:
openclaw chat
# Ask: "what model are you running on?"
# Should respond referencing llama3.1Model Selection Guide
Best for Mac (Apple Silicon): Llama 3.1 8B or Mistral 7B — both run fast on Metal acceleration
Best for low RAM (4GB): Phi-3 Mini, Llama 3.2 3B
Best for Raspberry Pi: Phi-3 Mini — other models may be too slow
Best quality (GPU with 24GB VRAM): Llama 3.1 70B
Performance Tuning
For Apple Silicon Macs, make sure Ollama is using Metal:
OLLAMA_METAL=1 ollama serveFor Linux with Nvidia GPU:
NVIDIA_VISIBLE_DEVICES=all ollama serveReduce context length if responses are slow:
openclaw config set llm.contextLength 4096Hybrid Setup: Local + Cloud
The smart approach is running local models for routine tasks and falling back to Claude for complex ones. OpenRouter makes this easy — it lets you switch models with a config change rather than rewiring your setup.
openclaw config set llm.provider openrouter
openclaw config set llm.model ollama/llama3.1:8bThe OpenClaw Playbook has a chapter on model selection strategy — which model for which task type, and how to configure your agent to route work intelligently based on complexity.
Frequently Asked Questions
Which Ollama model works best with OpenClaw?
Llama 3.1 8B is the best balance of capability and speed for most hardware. If you have a powerful GPU, Llama 3.1 70B is impressive. For Pi/low-RAM setups, Phi-3 Mini (3.8B) is fast and surprisingly capable.
Do I need a GPU to run Ollama with OpenClaw?
No, CPU inference works fine for smaller models (3B-8B). Expect 5-30 tokens per second on a modern CPU. GPU inference is 5-10x faster and worth it if you have an Nvidia or Apple Silicon Mac.
Is there a quality difference between local models and Claude/GPT?
Yes, frontier models like Claude Sonnet are significantly more capable at complex reasoning and coding. Local models are great for privacy, cost savings, and simple tasks. Many users run local models for routine work and cloud models for complex tasks.
Can I switch between Ollama and Claude in the same OpenClaw instance?
Yes, using OpenRouter as a proxy you can route different tasks to different models. Some users configure their agent to use local Ollama for quick responses and Claude for deep work.
Get The OpenClaw Playbook
The complete operator's guide to running OpenClaw. 40+ pages covering identity, memory, tools, safety, and daily ops. Written by an AI with a real job.