OpenClaw Status Checks: Know Which Channel or Session Is Really Broken
Read from search, close with the playbook
If this post helped, here is the fastest path into the full operator setup.
Search posts do the first job. The preview, homepage, and full playbook show how the pieces fit together when you want the whole operating system.
Most agent outages start with the wrong question. Someone says, “Slack is broken,” but the Gateway is fine and the channel is simply ignoring the room. Someone says, “the session disappeared,” but the provider is connected and no new conversation row has been materialized yet. Someone restarts everything when the real issue is an expired token, a stale allowlist, or a mention policy doing exactly what it was configured to do.
OpenClaw gives operators several health surfaces for this reason. openclaw status summarizes local channel and session state. openclaw health asks the running Gateway for a health snapshot. openclaw doctor checks configuration, state, plugins, model readiness, and repairable problems. Channel probes tell you whether a provider transport is actually usable. The useful habit is not “run one magic command.” It is to separate Gateway health, channel health, session state, and reply policy before touching anything destructive.
Start with the shortest safe ladder
The channel troubleshooting docs give a practical command ladder. Run it before you diagnose from vibes:
openclaw status
openclaw gateway status
openclaw logs --follow
openclaw doctor
openclaw channels status --probe That ladder is intentionally conservative. openclaw status is the fast read-only overview. openclaw gateway status tells you whether the service boundary is running. openclaw logs --follow shows the live events around inbound messages, reconnects, and sends. openclaw doctor checks broader system health. openclaw channels status --probe asks the configured channels to prove more than “there is a config row.”
The docs describe a healthy baseline as a running runtime, an ok connectivity or RPC probe, and a channel probe that shows the transport connected and ready, working, or audit-ok depending on the provider and OpenClaw version. If one of those layers is red, fix that layer. Do not jump straight from “no reply in Slack” to rebuilding the whole agent.
Use status for the operator overview
openclaw status is where I start because it answers the broad question: what does this installation think is alive right now?
openclaw status
openclaw status --all
openclaw status --deep
openclaw status --usage The status docs define it as diagnostics for channels and sessions. The regular command stays on a fast read-only path. --all expands the local diagnosis and includes a secrets overview plus a diagnosis section for SecretRef problems when available. --deep runs live probes through the running Gateway for supported channels. --usage prints normalized provider usage windows as remaining quota.
That distinction matters. A fast status snapshot is good for triage. A deep status probe is better when you need live channel evidence. Usage is useful when the agent “works” but model calls are failing or throttled. If you are debugging a customer-facing automation, I would rather see one status --all plus one targeted channel probe than ten screenshots of a chat app.
Status also includes more than channel names. The docs say it can show per-agent session stores when multiple agents are configured, Gateway and node host service install/runtime status when available, update channel and git SHA for source checkouts, and SecretRef diagnostics without crashing when a supported secret is unavailable in that command path. That makes it a safe thing to paste into an internal debugging thread, after you still review it for anything private.
Do not confuse sessions with sockets
This is the trap that wastes the most time. Session rows are conversation state. They are not proof that a provider socket is live. The Gateway health docs call this out directly: for Discord and other chat providers, session rows read stored conversation state. A provider can reconnect and show healthy channel status before any new session row appears.
So if a session seems missing, ask two questions separately. First: is the channel transport connected and capable of receiving or sending? Use openclaw health, openclaw status --deep, or openclaw channels status --probe. Second: has a new inbound message actually reached the Gateway and passed the channel’s policy gates? Use logs, pairing checks, allowlists, and mention policy. A missing session row by itself is not a channel outage.
For deeper session behavior, pair this with the OpenClaw session management guide. The operational takeaway here is simpler: diagnose the transport before you blame the transcript store.
Use health when the Gateway must answer for itself
openclaw health asks the running Gateway for a health snapshot. The docs describe it as WS-only; the CLI does not open direct provider sockets itself. That is exactly what you want when the question is “what does the live Gateway know?”
openclaw health
openclaw health --verbose
openclaw health --json
openclaw health --json --timeout 20000 openclaw health --json gives machine-readable output. --timeout <ms> changes the default ten-second probe timeout. Current docs also document --verbose, which forces a live probe and prints gateway connection details. The health snapshot can include an ok boolean, timestamp, probe duration, per-channel status, agent availability, and session-store summary. It exits non-zero if the Gateway is unreachable or the probe fails or times out.
That exit behavior makes it useful in scripts. If a scheduled job depends on the Gateway, a health check can fail closed instead of pretending the downstream automation is fine. If the health snapshot says the Gateway is unreachable, your next move is the Gateway boundary, not a Slack scope audit.
If you want this kind of operator checklist for memory, model routing, cron jobs, browser sessions, and production safety too, get ClawKit here.
Know what doctor is allowed to change
openclaw doctor is the bigger health surface. It checks gateway and channel health, configuration, local state, plugin readiness, model routing, memory readiness, and repairable setup problems.
openclaw doctor
openclaw doctor --lint
openclaw doctor --lint --json
openclaw doctor --deep
openclaw doctor --fix --non-interactive Use the posture deliberately. The regular command is for human-oriented checks and guided prompts. doctor --lint is read-only and better for automation or review gates; with --json, it emits structured findings. doctor --deep scans extra service state. doctor --fix or --repair can apply supported repairs.
The repair path is not harmless busywork. The docs say --fix writes a backup to ~/.openclaw/openclaw.json.bak and drops unknown config keys, listing removals. Interactive prompts only run when stdin is a TTY and --non-interactive is not set, so headless cron runs skip prompt-only fixes. State integrity checks can detect orphan transcript files, but archiving them requires interactive confirmation. In other words: use --lint when you want evidence, and use repair only when you are ready for a controlled mutation.
Separate auth failures from policy failures
A connected channel can still refuse to answer. The troubleshooting docs list the usual signatures by provider. For Slack, socket mode can be connected while responses fail because app token, bot token, or scopes are wrong; DMs can be blocked by pairing; channel messages can be ignored by group policy or channel allowlists. For Telegram, a bot can be online while group visibility is blocked by mention requirements or bot privacy mode. For Discord, the bot can be online while guild replies are blocked by allowlists, channel rules, or missing message content intent.
The fastest fix depends on the failure class. If the provider is logged out or WhatsApp shows status codes in the 409-515 range, the health docs point to relinking. If inbound messages never appear, check the sender allowlist, group allowlist, and mention policy before editing model prompts. If send failures show network errors, inspect provider API routing and logs instead of changing agent instructions.
openclaw channels status --probe
openclaw channels logout
openclaw channels login --verbose Relinking is a real action, so do it only after the probes and logs point there. For WhatsApp, the docs recommend logout and login when those status codes or loggedOut appear. For other providers, use the specific troubleshooting page linked from the channel troubleshooting index.
Use logs to prove the message path
When the UI says “nothing happened,” logs can tell you whether the message never arrived, arrived and was dropped by policy, arrived and created work, or produced a response that failed on send. The health docs suggest tailing OpenClaw logs and filtering for web heartbeat, reconnect, auto-reply, and inbound events.
tail -f /tmp/openclaw/openclaw-*.log | grep -E 'web-heartbeat|web-reconnect|web-auto-reply|web-inbound' I treat logs as the bridge between channel health and session health. A channel probe can say the transport is connected. A session store can show old conversations. Logs show whether this specific message made it through the live path. If logs show inbound events but no reply, look at mention gating, allowlists, group policy, tool/action requirements, model/runtime issues, and send errors. If logs show no inbound events, do not blame the agent. The message never reached the part of the system that could answer.
Tune the monitor, but do not hide outages
OpenClaw has channel health monitor settings for built-in monitors that expose them today, including Discord, Google Chat, iMessage, Microsoft Teams, Signal, Slack, Telegram, and WhatsApp. The documented Gateway settings are gateway.channelHealthCheckMinutes, defaulting to five; gateway.channelStaleEventThresholdMinutes, defaulting to thirty; and gateway.channelMaxRestartsPerHour, defaulting to ten.
You can disable health-monitor restarts globally by setting gateway.channelHealthCheckMinutes to zero. You can also disable restarts per channel with channels.<provider>.healthMonitor.enabled, or per account with channels.<provider>.accounts.<accountId>.healthMonitor.enabled. The account-level override wins over the channel-level setting.
That is useful when a provider is flapping and you need to stop restart churn while you inspect it. But it is not a fix. If you turn monitors down or off, write down why, verify the replacement check, and turn normal monitoring back on when the provider is stable.
The decision tree I use
- Run
openclaw status. If the Gateway or runtime is clearly down, stay at the service layer. - Run
openclaw health --jsonoropenclaw status --deepwhen you need live Gateway/channel evidence. - Run
openclaw channels status --probefor provider-specific proof. - Check logs for the actual inbound message before blaming the session store.
- Check pairing, allowlists, group policy, and mention requirements before changing prompts.
- Run
openclaw doctor --lint --jsonfor read-only structured findings, then choose whether repair is appropriate. - Only relink, restart, or repair after the failing layer is identified.
Status checks are not about collecting comforting green badges. They are about refusing to mix up four different problems: the Gateway is unavailable, the channel is disconnected, the message is blocked by policy, or the session state is not what you expected. Once you separate those, the fix is usually small.
For related operating habits, read the health checks guide and the retry policy guide. The best operators are boring here: probe first, repair second, report only what was verified.
Want the complete guide? Get ClawKit — $9.99