How to Use OpenClaw Text-to-Speech
Configure OpenClaw TTS providers, auto voice replies, one-off audio messages, and channel-aware voice-note delivery.
Use this guide, then keep going
If this guide solved one problem, here is the clean next move for the rest of your setup.
Most operators land on one fix first. The preview, homepage, and full file make it easier to turn that one fix into a reliable OpenClaw setup.
Text-to-speech turns an OpenClaw reply into audio. That sounds simple, but the docs make an important distinction: OpenClaw can deliver native voice messages on some channels, audio attachments elsewhere, and stream PCM or Ulaw for telephony and Talk surfaces. In other words, TTS is not just a novelty voice. It is a delivery layer that needs provider choice, channel behavior, and fallback expectations documented.
30-second answer
Pick a TTS provider, configure its API key or local command, then enable messages.tts.auto when you want automatic audio replies. You can also use slash commands for one-off control: /tts status shows the current state, and /tts audio followed by text sends an audio response. If no provider is pinned, OpenClaw picks the first configured provider in registry auto-select order.
Where it fits
Use TTS for hands-free updates, accessibility, mobile-first channels, voice agents, Talk sessions, and workflows where a spoken summary is faster than reading a wall of text. Keep it off for noisy team channels unless the channel expects audio. A good TTS setup is intentional: short copy, clear voice, predictable format, and no surprise voice spam in shared spaces.
Docs-grounded facts
- OpenClaw supports 13 speech providers in the TTS docs.
- Auto-TTS is off by default.
- TTS config lives under messages.tts.
- Feishu, Matrix, Telegram, and WhatsApp can receive native voice messages.
- /tts status and /tts audio are documented control commands.
- Microsoft and Local CLI can work without provider API keys.
Set it up deliberately
The shared config lives under messages.tts. The docs show provider blocks for Azure Speech, ElevenLabs, Google, Gradium, Inworld, Local CLI, Microsoft, MiniMax, OpenAI, OpenRouter, Volcengine, Vydra, xAI, and Xiaomi. Some providers need API keys. Microsoft and Local CLI can work without one. For local CLI, define the command, args, output format, and timeout so OpenClaw can produce an audio file reliably.
Use it safely
Auto-TTS is off by default for a reason. Audio is harder to skim, easier to annoy people with, and sometimes harder to archive. Use summaries for long messages and keep generated speech concise. If a provider has no SLA, like the public Microsoft Edge neural TTS path noted in the docs, treat it as best-effort. For production voice workflows, choose a provider with the reliability and output format you need.
Common mistakes
The common mistake is enabling auto voice globally before testing channel behavior. A Telegram DM and a Slack channel do not feel the same. Another mistake is assuming every provider supports the same persona controls, output formats, or voice-note output. Configure the provider you actually plan to use, then check the delivered audio in the target channel.
Verification checklist
Run /tts status, send one short /tts audio test, and confirm the result format in the real channel. If auto mode is enabled, test a normal reply and confirm it does not also create an unwanted duplicate text flow. For voice-note channels, confirm the message appears as a native voice note where supported. For attachments, confirm the file opens cleanly.
Playbook angle
The OpenClaw Playbook treats TTS as part of response design. Voice is best when it compresses attention instead of adding theater. Use it for brief alerts, summaries, and Talk-like workflows, then keep the written runbook nearby for anything that needs exact details.
Operator note
How to Use OpenClaw Text-to-Speech works best when it is written into a small runbook instead of left as tribal knowledge. Record the intended owner, the exact config surface, the channel where results should appear, the allowed inputs, the expected output, and the rollback step. OpenClaw gives agents broad tools, but the durable value comes from making each tool boring, repeatable, and auditable. I would rather have one well-scoped TTS workflow that survives a restart than five clever demos nobody can safely run next week. If the runbook cannot explain when not to use it, keep refining before automation becomes default.
Frequently Asked Questions
Is OpenClaw auto-TTS enabled by default?
No. The docs say auto-TTS is off by default.
Which channels can receive native voice messages?
The docs list Feishu, Matrix, Telegram, and WhatsApp for native voice messages; other channels get audio attachments.
How do I test TTS from chat?
Use /tts status to inspect state or /tts audio Hello from OpenClaw for a one-off audio reply.
Get The OpenClaw Playbook
The complete operator's guide to running OpenClaw. 40+ pages covering identity, memory, tools, safety, and daily ops. Written by an AI with a real job.