OpenClaw Usage Tracking: Watch Tokens and Costs Before They Surprise You

Hex · May 2, 2026 · 8 min read

Read from search, close with the playbook

If this post helped, here is the fastest path into the full operator setup.

Search posts do the first job. The preview, homepage, and full playbook show how the pieces fit together when you want the whole operating system.

Read the free preview See the tone and depth before you buy anything. Visit the homepage Get the full value prop, proof, and operator overview in one place. Get the Playbook, $19.99 Email-first checkout, instant delivery, full refund if it is not useful.

AI agent cost problems rarely arrive as one obvious disaster. They show up as “why is this session so heavy?”, “why did today burn through quota?”, or “why did a cheap automation suddenly become expensive after we added media and browser work?”

OpenClaw gives you a few different usage surfaces, and they are intentionally not all the same thing. Some show tokens for the current session. Some show local estimated cost. Some query provider usage windows. If you mix those up, you will either panic over the wrong number or miss the number that actually matters.

This is the operator map I use: tokens tell you what the model processed, estimated costs tell you what OpenClaw can price locally, provider windows tell you what the upstream account reports, and context tools tell you why the prompt got large in the first place.

Start with the boring fact: OpenClaw tracks tokens

The token-use docs are blunt: OpenClaw tracks tokens, not characters. Tokens are model-specific, though OpenAI-style models often average around four English characters per token. That rough conversion is useful for intuition, but it is not the accounting source. The model provider reports token usage, and OpenClaw normalizes what it can.

Everything the model receives can count toward the context window: the system prompt, conversation history, tool calls, tool results, attachments, compaction summaries, pruning artifacts, and provider-side wrappers that you do not see as normal text. That is why a “small” reply can still be backed by a large prompt.

If you have not read the adjacent context pieces yet, pair this with the system prompt guide and the compaction guide. Usage tracking is much easier once you understand what is being sent to the model.

The four chat commands worth memorizing

For day-to-day work, I would keep these in muscle memory:

/status
/usage tokens
/usage full
/usage cost

/status gives the quick status card. The docs say it shows the session model, context usage, last response input and output tokens, and estimated cost when the current model uses API-key auth. Provider usage appears for the current model provider when available.

/usage tokens appends a per-response usage footer with token numbers only. /usage full adds the fuller footer and can include estimated cost when API-key pricing is available. OAuth and subscription-style flows hide dollar cost, so they show tokens only. That is not a bug; OpenClaw is avoiding fake precision when it cannot price the provider path locally.

/usage cost is different again. It shows a local cost summary aggregated from OpenClaw session logs. That makes it useful after the fact, especially when you want to understand what a session has been doing over time instead of only looking at the latest response.

Estimated cost is not provider quota

This distinction matters enough to say plainly: estimated cost and provider usage windows are not the same number.

Estimated cost comes from your model pricing config. The documented config shape is:

models.providers.<provider>.models[].cost

Those prices are expressed as USD per one million tokens for input, output, cacheRead, and cacheWrite. If pricing is missing, OpenClaw shows tokens only. If the auth path is OAuth or a subscription-style CLI flow, OpenClaw does not invent a dollar amount for the footer.

Provider usage windows come from provider usage or quota endpoints. The usage-tracking docs describe that surface as provider-reported windows, not estimated costs. In the current public docs, human output is normalized to an X% left style display even when upstream providers report consumed quota, remaining quota, or raw counts.

So if /status says a response cost estimate is available, that is local pricing math for the latest reply. If openclaw status --usage says a provider window has some percentage left, that is a quota snapshot from the provider side. Both are useful. They answer different questions.

If your agent is starting to feel expensive, do not wait for the bill to explain it. Get ClawKit and set up the operating checks that keep token growth, provider usage, and automation costs visible.

Use the CLI when you need the provider view

The chat commands are great while you are inside a session. The CLI is better when you are auditing the whole setup.

openclaw status --usage
openclaw channels list

The docs say openclaw status --usage prints a full per-provider breakdown, and openclaw channels list can show the same usage snapshot alongside provider config. These are quota-window views, not per-response cost footers.

Usage is hidden when OpenClaw cannot resolve usable provider usage auth. The docs list several credential sources: auth profiles, environment variables, config, and skill entries that export keys to a skill process. That means a missing usage display does not necessarily mean a provider has no usage. It may mean OpenClaw does not have the right auth path for that provider's usage endpoint.

Provider usage windows are documented for providers such as Anthropic, GitHub Copilot, Gemini CLI, OpenAI Codex, MiniMax, Xiaomi MiMo, and z.ai, with provider-specific credential behavior. I would not design an ops dashboard around assumptions here. Run the status command and use what your configured providers actually expose.

Know what can spend money

The API usage docs are useful because they do not only talk about chat replies. They list the OpenClaw features that can invoke keys or paid provider APIs. The obvious one is core model responses: every reply or tool-loop model call uses the current model provider.

But the less obvious surfaces matter in real operations:

Media understanding: inbound audio, images, and video can be summarized or transcribed before the reply runs.
Image and video generation: shared generation capabilities can spend provider keys when configured.
Semantic memory search: remote embedding providers can bill when memorySearch.provider uses hosted embeddings. local keeps that path local.
Web search: web_search may use paid search APIs depending on provider and keys.
Web fetch: web_fetch can call Firecrawl when configured; without Firecrawl it falls back to direct fetch plus the bundled readability path.
Compaction: summarizing session history can invoke the current model.
Talk mode and skills: speech providers and third-party skills can have their own API keys and costs.

This is the part operators miss. You might tune the model but forget that a workflow now fetches the web, transcribes voice notes, searches memory with remote embeddings, and compacts long sessions. Each piece can be reasonable on its own while the combined automation becomes noisy.

Audit context before blaming the model

When a session gets expensive, do not only ask “which model is this?” Ask “what is being sent to the model?”

/context list
/context detail
/compact

The context docs say /context list shows what is injected and rough sizes, while /context detail gives deeper breakdowns by file, tool schema, skill entry, and system prompt. That is where you find the quiet cost sources: a bloated TOOLS.md, repeated tool results, a long session that needs compaction, or a skills list that has grown beyond what the agent needs.

/compact summarizes older history to free context room. It is not a magic cost eraser, because compaction itself can use a model call, but it is the right tool when a thread has become too large to keep carrying raw history.

Prompt caching changes the cost shape

The token docs also call out prompt caching. Provider prompt caching only applies within the provider's cache TTL window. OpenClaw can optionally run cache-TTL pruning after the cache TTL expires, then reset the cache window so later requests can reuse a freshly cached context instead of re-caching the full history.

Heartbeat can keep a cache warm across idle gaps. The docs give the practical example: if the model cache TTL is one hour, a heartbeat interval just under that, such as 55m, can avoid re-caching the full prompt and reduce cache write costs.

I would treat this as an advanced optimization, not step one. First make the workspace files lean, trim giant tool output, and inspect context. Then tune cache behavior for sessions that are valuable enough to keep warm.

A simple operating routine

Here is the usage routine I would use for a serious OpenClaw operator:

During a live session: run /status when behavior feels heavy or surprising.
When you want per-reply visibility: turn on /usage tokens or /usage full for that session.
When cost is unclear: use /usage cost to inspect local session-log totals.
When quota matters: run openclaw status --usage for provider windows.
When the prompt is too large: inspect /context detail, then prune workflows or compact the session.
When adding a new automation: check whether it invokes model calls, media understanding, web search, web fetch, embeddings, speech, generation, or third-party skills.

The operator takeaway is simple: usage tracking is not one dashboard. It is a set of surfaces that answer different questions. Use tokens to understand model work, estimated cost to price configured API-key traffic, provider windows to watch quota, and context diagnostics to find why the prompt got heavy.

That habit is how an AI agent stays useful after the demo stage. You do not need to fear tokens. You need to make them visible before they surprise you.

Want the complete guide? Get ClawKit — $9.99