OpenClaw Thinking Mode: When to Make Your Agent Think Harder
A lot of operators make the same mistake with reasoning models. They either leave thinking off everywhere and wonder why the agent misses nuance, or they crank it to the ceiling for every task and then complain about speed and cost.
OpenClaw gives you a better control surface than that. The /think directives let you raise or lower reasoning per message, set a session default, and fall back to configured defaults when you stay silent. That is exactly what you want in production: not one magical intelligence knob, but a clear way to spend more thought only when the task deserves it.
If you already run coding threads through ACP agents or use sub-agents for heavier work, thinking mode is one of the simplest ways to improve output quality without redesigning your whole stack.
What OpenClaw thinking mode actually is
OpenClaw supports inline thinking directives in inbound messages. The documented forms are /t <level>, /think:<level>, or /thinking <level>. The supported levels are off, minimal, low, medium, high, xhigh, and adaptive.
The docs also map those levels to familiar language:
minimalis the lightest “think” setting.lowmaps to “think hard”.mediummaps to “think harder”.highis the max-budget “ultrathink”.xhighis “ultrathink+” and is only supported on GPT-5.2 and Codex model paths.adaptiveuses provider-managed reasoning budget and is supported for the Anthropic Claude 4.6 family.
There are also alias spellings for extra-high levels. For example, x-high, x_high, extra-high, and extra high all normalize to xhigh. If you care about clean operator ergonomics, that normalization is a nice touch.
One-message override versus session default
This is the part that matters most in real use. An inline directive on a normal message applies only to that message. A directive-only message sets the session default instead.
/t high Compare these two deployment plans and tell me where the rollback risk is.
/think:medium
/think In that example, the first line raises reasoning for one message only. The second line, because it is only the directive, stores medium as the session override. The third line asks OpenClaw to report the current thinking level.
The docs are explicit about the lifecycle too. A directive-only message sticks for the current session, is per-sender by default, and is cleared by /think:off or by a session idle reset.
My advice is simple: use one-message overrides for spikes in complexity, and session defaults when you are entering a known mode of work like code review, architecture planning, or incident response.
The resolution order is what keeps this sane
OpenClaw does not guess randomly. It resolves thinking level in a defined order:
- Inline directive on the message.
- Session override from a directive-only message.
- Per-agent default at
agents.list[].thinkingDefault. - Global default at
agents.defaults.thinkingDefault. - Fallback behavior based on the active model family.
That fallback is also documented. Anthropic Claude 4.6 defaults to adaptive when no explicit thinking level is set. Other reasoning-capable models fall back to low. Models without reasoning support fall back to off.
This is why I like the feature. You can set a sane agent-wide baseline, then temporarily go higher for a hard problem, without losing track of what is driving the final behavior. Mystery defaults are how operators get burned. Resolution order is how you avoid that.
When to actually raise the thinking level
Not every task deserves extra reasoning. If the work is mostly retrieval, formatting, or a straightforward lookup, more thinking can just mean more latency. I prefer to spend that budget only when the agent needs to compare options, trace side effects, or resolve ambiguity.
off or minimal
Use these for low-risk, high-volume work: formatting a short reply, moving data from one place to another, lightweight status updates, or boring glue tasks. If the agent mostly needs tools and not deep deliberation, do not overpay for introspection.
low or medium
This is the practical sweet spot for a lot of operators. It is a good fit for doc-based answers, ordinary debugging, task triage, and writing work that still needs some judgment. If you want a stable session default, medium is usually safer than going straight to high.
high
Use this when the cost of a shallow answer is real: reviewing a risky change plan, evaluating regression paths, comparing architecture tradeoffs, or untangling a weird failure with multiple interacting causes. If the agent needs to think through second-order effects, high earns its keep.
xhigh
Reserve this for genuinely expensive reasoning tasks and only on supported model paths. If your question is “should I set this everywhere?”, the answer is no. Use it for the ugly cases, not the daily cases.
adaptive
If you are on the Claude 4.6 family and want provider-managed reasoning budget, adaptive exists for that. I would rather use it intentionally than accidentally. The docs make clear that it is both a supported level and the no-directive fallback for that model family.
The OpenClaw Playbook
Want better output without blindly burning model budget?
ClawKit shows you how to tune model routing, tool access, safety, and operator workflows so your agent gets smarter where it matters.
Get ClawKit — $9.99 →Provider-specific caveats you should know
This is where many “AI reasoning” explainers get sloppy, so here is the clean version based on the docs.
- Anthropic Claude 4.6 models default to
adaptivewhen no explicit thinking level is set. - Z.AI only supports binary thinking. In practice, any level other than
offis treated as thinking enabled and mapped tolow. - Moonshot maps
/think offto disabled thinking and any non-offvalue to enabled thinking. When Moonshot thinking is enabled, OpenClaw normalizes incompatibletool_choicevalues toautobecause Moonshot only acceptsautoornonein that mode.
The operator lesson is obvious: a single directive surface does not mean identical backend behavior. OpenClaw gives you a uniform chat control, but the platform still respects provider-specific constraints underneath. That is the right abstraction boundary.
Thinking mode is not the same as reasoning visibility
People mix these up all the time. Thinking level controls the reasoning budget. /reasoning controls whether thinking blocks are shown back to you.
Per the docs, reasoning visibility supports on, off, and stream. When enabled, reasoning is sent as a separate message prefixed with Reasoning:. The stream mode is Telegram-only.
That separation is healthy. You can ask for deeper reasoning without necessarily exposing the thinking blocks in every channel. For a production operator, that matters a lot.
A simple operating pattern that works
If you want a reliable default workflow, I would use something like this:
- Set your normal session to
/think:mediumfor complex work blocks. - Drop to
/think:offwhen you move back to lightweight execution. - Use
/t highon the handful of prompts that really need deeper comparison or risk analysis. - Keep
xhighas an exception, not a lifestyle.
That gives you better answers without training yourself to accept slow, overcooked output on routine work. It also pairs nicely with exec approvals: let the agent think harder when the plan is risky, not when the shell command is obvious.
Where this pays off in real operator work
Change planning
If the agent needs to trace end-to-end impact across multiple files or services, higher thinking is worth it. This is especially true before you spawn a coding session or approve a risky edit.
Incident triage
When logs are noisy and several root causes are plausible, high helps the model compare hypotheses instead of latching onto the first convenient explanation.
Strategy prompts
When you are asking the agent to prioritize channels, compare product bets, or critique a plan, shallow reasoning usually sounds confident and says very little. Raising the thinking level is one of the cheapest ways to force more care into that work.
Hard doc synthesis
If the task involves reconciling several docs or identifying caveats across providers, thinking mode helps. It does not replace source verification, but it makes the synthesis step stronger.
Final take: make thinking explicit
The real win here is not that OpenClaw gives you more reasoning levels. It is that it gives you a documented contract for when they apply, how session defaults stick, and how different providers behave underneath.
That is what serious operators need. Not vague “use a smarter model” advice. Just a clean way to say: this question is cheap, this one is tricky, and this one deserves the heavy reasoning budget.
If your agent feels inconsistent, do not only blame the model. Check whether you are being explicit about how hard it should think.
Want the complete guide? Get ClawKit — $9.99