OpenClaw Command Queue: Stop Agent Runs From Colliding

Hex · April 26, 2026 · 8 min read

Read from search, close with the playbook

If this post helped, here is the fastest path into the full operator setup.

Search posts do the first job. The preview, homepage, and full playbook show how the pieces fit together when you want the whole operating system.

Read the free preview See the tone and depth before you buy anything. Visit the homepage Get the full value prop, proof, and operator overview in one place. Get the Playbook, $19.99 Email-first checkout, instant delivery, full refund if it is not useful.

The first time an AI agent feels “alive,” people start sending it messages the way they would message a teammate: one thought, then another correction, then a screenshot, then “actually wait.” That is normal human behavior. It is also exactly where unattended agents can get messy if every inbound message starts a fresh run immediately.

OpenClaw’s command queue exists for that boring but critical middle layer. It keeps inbound auto-reply runs from colliding, while still letting separate sessions run in parallel when your setup allows it. That matters for Slack, Telegram, WhatsApp, Discord, Signal, iMessage, webchat, and any other inbound surface that uses the gateway reply pipeline.

The short version: OpenClaw serializes work per session, then applies a broader global concurrency cap. One conversation should not have two active agent runs writing to the same session transcript at the same time. Different conversations can still move independently, subject to your configured limits.

This post is the practical operator view. If you want the adjacent pieces afterward, read Channel Routing for how messages reach sessions, Session Tools for cross-session handoffs, and Cron vs Heartbeat for scheduled work.

The problem the queue solves

An agent run is not a tiny stateless webhook. The OpenClaw agent loop does real work: it takes the inbound message, resolves the session, assembles context, calls the model, executes tools, streams assistant and tool events, persists history, and sends the final reply. That whole loop needs a consistent view of the session.

If two messages from the same conversation start two independent runs at the same time, they can compete for shared resources: session files, logs, CLI stdin, tool state, provider limits, and user-visible reply order. Even when nothing corrupts, the result can feel wrong. The agent may answer the older message after the newer one, miss a correction, or duplicate work.

OpenClaw’s docs describe the queue as a lane-aware FIFO queue for inbound auto-reply runs. The important operator promise is simple: only one active run touches a given session at a time. That is the difference between “my agent is busy but sane” and “my agent is racing itself.”

How a message moves through OpenClaw

The messages docs lay out the high-level flow like this:

Inbound message
  -> routing/bindings -> session key
  -> queue (if a run is active)
  -> agent run (streaming + tools)
  -> outbound replies (channel limits + chunking)

That shape matters. Queueing happens after routing has produced a session key. Direct chats, groups, channels, cron jobs, webhooks, and node runs do not all share the same key shape. Direct chats default to the agent’s main session for continuity. Groups and rooms are isolated. Cron jobs create fresh sessions unless configured otherwise. Webhooks are isolated unless explicitly set by the hook.

So when you ask “why did this wait?” the first question is not “which app was it from?” The first question is “which session key did it map to?” Two Slack messages in the same thread can be serialized because they hit the same session. A Slack thread and a Telegram DM can be separate sessions and may run in parallel if the global cap allows it.

The two queues to keep in your head

There are two layers worth remembering:

Per-session lane: runEmbeddedPiAgent enqueues by session key, using a lane like session:<key>. This is what guarantees one active run per session.
Global lane: each session run is then queued into a broader lane, main by default, so overall parallelism is capped by agents.defaults.maxConcurrent.

That split is the clean mental model. Per-session serialization protects transcript consistency. Global concurrency protects the machine, gateway, and upstream providers from too much work at once.

There can also be additional lanes, such as cron or subagent lanes, so background jobs can run without necessarily blocking normal inbound replies. Do not treat that as permission to crank concurrency blindly. If your agent calls expensive tools, hits paid model APIs, or touches fragile external systems, conservative parallelism is usually smarter.

The default behavior: collect followups

When a run is already active, OpenClaw needs to decide what to do with new inbound messages. The documented default is collect across surfaces.

collect means queued messages are coalesced into a single followup turn after the current run ends. If queued messages target different channels or threads, OpenClaw drains them individually to preserve routing. That is a useful default because real users often split one thought across several messages. The agent should usually respond once to the combined intent, not five times to five fragments.

Here is the kind of message burst collect handles well:

Rahul: deploy this
Rahul: wait, preview first
Rahul: also check the checkout CTA
Rahul: don't publish prod until I approve

With collection, the followup turn can see the correction and the extra constraint together. That is much better than having the agent charge ahead on “deploy this” while the next three messages fight to catch up.

The queue modes and when I would use them

OpenClaw exposes several queue modes. They are not interchangeable, so pick for the behavior you actually want:

collect: the default. Batch queued messages into one followup turn. Best for normal chat, support, Slack, Discord, and operator workflows where users send corrections quickly.
followup: enqueue the new message for the next agent turn after the current run ends. Use when each inbound message should remain distinct.
steer: inject immediately into the current run and cancel pending tool calls after the next tool boundary. If the run is not streaming, it falls back to followup.
steer-backlog or steer+backlog: steer now and also preserve the message for a later followup. The docs warn this can look like duplicate responses on streaming surfaces.
interrupt: legacy mode that aborts the active run for that session and runs the newest message.
queue: legacy alias for steer.

My default recommendation is boring: keep collect unless you have a clear reason not to. It matches how people actually message. Use steer for surfaces where mid-run course correction matters more than waiting cleanly. Be careful with steer-backlog; it is powerful, but duplicate-looking responses are not a great user experience.

Want the full operator setup instead of learning queue behavior from production weirdness? Get ClawKit here.

The config that actually matters

The queue settings live under messages.queue. A normal conservative shape looks like this:

{
  messages: {
    queue: {
      mode: "collect",
      debounceMs: 1000,
      cap: 20,
      drop: "summarize",
      byChannel: {
        discord: "collect",
        telegram: "collect"
      }
    }
  }
}

The queue options apply to followup, collect, steer-backlog, and to steer when it falls back to a followup:

debounceMs: wait for a quiet window before starting a followup turn. This is what stops “continue, continue” from becoming a pile of tiny runs.
cap: maximum queued messages per session.
drop: overflow behavior. The documented values are old, new, and summarize.

The documented defaults are debounceMs: 1000, cap: 20, and drop: summarize. With summarize, OpenClaw keeps a short bullet list of dropped messages and injects it as a synthetic followup prompt. That is a pragmatic default: it avoids unbounded queue growth while still giving the agent some awareness of what overflowed.

Do not confuse inbound debounce with queue debounce

There is another debounce setting under messages.inbound. It is related, but it is not the same thing.

messages.inbound batches rapid consecutive text-only messages from the same sender into a single agent turn before the run starts. It is scoped per channel and conversation. Media and attachments flush immediately. Control commands bypass debouncing.

{
  messages: {
    inbound: {
      debounceMs: 2000,
      byChannel: {
        whatsapp: 5000,
        slack: 1500,
        discord: 1500
      }
    }
  }
}

The way I think about it: inbound debounce handles “the user is still typing the first thought.” Queue debounce handles “the agent is already running, and followup messages are arriving.” Both reduce noisy turns, but they sit at different points in the pipeline.

Per-session overrides are useful for live ops

You do not always want to change global config just because one conversation needs a different posture. OpenClaw supports standalone chat commands for per-session queue overrides:

/queue collect
/queue collect debounce:2s cap:25 drop:summarize
/queue default
/queue reset

That is handy in a real operating thread. If a Slack thread is getting noisy, switch it to collect with a slightly larger debounce. If a time-sensitive channel needs immediate steering, use steer for that session instead of changing the entire agent.

Keep the command standalone. Do not bury it inside a paragraph of normal instructions. The docs describe these as standalone commands, and treating them that way keeps command parsing boring.

Queueing is not retrying

OpenClaw also has a retry policy, but it solves a different problem. The retry docs are about outbound provider requests such as message sends, media uploads, reactions, polls, and stickers. Retries apply per HTTP request, not per multi-step flow, so completed steps are not replayed as part of a composite operation.

That distinction matters. Queueing preserves ordering and prevents session races before an agent run begins. Retrying handles transient send/provider failures for the current request. If Telegram rate-limits a message send, retry policy may help. If five Slack messages hit one active session, queue policy decides how they wait or steer.

How to tell the queue is involved

The docs keep the troubleshooting guidance intentionally small: enable verbose logs and look for short notices when queued runs waited more than about two seconds before starting. If you need queue timing, watch verbose logs for queue timing lines.

Also pay attention to typing indicators. The queue docs say typing indicators still fire immediately on enqueue when the channel supports them, so the user can see that the agent noticed the message even while the run waits its turn. That is a subtle UX detail, but it matters. Waiting feels less broken when the channel shows the agent is present.

When diagnosing a stuck-feeling setup, I would check in this order:

Identify the session. Is this a direct chat, group, thread, cron run, webhook, or node run?
Check queue mode. Is the session using collect, followup, steer, or a backlog mode?
Check caps. Is agents.defaults.maxConcurrent too low for your current traffic, or intentionally low for safety?
Check whether messages are being batched before the run. That is messages.inbound, not messages.queue.
Check provider/channel retries only if sending failed. Do not debug a queue wait as a retry problem.

The operator takeaway

A good agent should not answer every fragment instantly. It should protect session history, preserve routing, understand corrections, and avoid racing itself. OpenClaw’s queue is the layer that makes that possible.

If you remember one thing, make it this: session lanes protect correctness; global concurrency protects capacity; queue modes shape user experience. Tune those separately and your agent will feel much more like a reliable operator instead of a webhook with a model attached.

Want the complete guide? Get ClawKit — $9.99