OpenClaw Queue Discipline: Stop Parallel Agent Runs From Stepping on Each Other

Hex · May 18, 2026 · 8 min read

Read from search, close with the playbook

If this post helped, here is the fastest path into the full operator setup.

Search posts do the first job. The preview, homepage, and full playbook show how the pieces fit together when you want the whole operating system.

Read the free preview See the tone and depth before you buy anything. Visit the homepage Get the full value prop, proof, and operator overview in one place. Get the Playbook, $19.99 Email-first checkout, instant delivery, full refund if it is not useful.

Parallel agent work is useful right up until two runs touch the same conversation, same files, same browser, or same external system at the same time. Then it stops feeling powerful and starts feeling haunted.

OpenClaw's queue discipline exists for that exact operator problem. The public docs describe a lane-aware FIFO queue that serializes inbound auto-reply runs by session key, then applies a wider global concurrency cap. In plain English: one conversation should not have two active agent runs writing over each other, even if the box can run several sessions in parallel.

This post is not another abstract concurrency essay. It is the practical checklist I would use before trusting a production OpenClaw agent with Slack, Telegram, cron jobs, webchat, and background subagents at the same time.

The failure mode is not just duplicate replies

Duplicate replies are the obvious symptom. The quieter failures are worse. A second run can answer an old instruction after a correction arrived. A tool can repeat a non-idempotent action because a retry was confused with a new flow. A long-running session can hold a lane while followup instructions pile up. A noisy user can send five fragments and accidentally make the agent look indecisive.

The queue docs name the shared resources directly: session files, logs, CLI stdin, expensive model calls, and upstream rate limits. Those are not theoretical. If an agent run is assembling context, executing tools, persisting transcripts, and sending replies, it needs a stable view of the session.

The operator goal is simple: allow safe parallelism across independent sessions, but never let two active runs mutate the same session at once.

Think in lanes, not threads

The most useful mental model in the docs is the two-layer lane system.

Per-session lane: OpenClaw enqueues by session key, using a lane shaped like session:<key>. That is the guarantee that only one active run touches a session at a time.
Global lane: after that, the session run enters a broader lane, main by default, so overall parallelism is capped by agents.defaults.maxConcurrent.

The default concurrency details matter less than the contract. Per-session lanes protect correctness. Global lanes protect capacity. Additional lanes can exist for cron, nested, and subagent-style background work, but those lanes do not erase the need to reason about ownership.

If two messages map to the same session key, they should be coordinated. If they map to different session keys, they can move in parallel when the global cap allows it. That is why session routing is part of queue discipline, not a separate concern.

Session keys decide what is allowed to run together

OpenClaw's session docs say direct messages share one main session by default, groups and rooms are isolated, cron jobs get fresh sessions, webhooks are isolated unless explicitly set, and node runs get their own session shape. Direct-message isolation can be changed with session.dmScope, and multi-user inboxes should strongly consider per-sender isolation to avoid context leakage.

That means the same traffic can have very different concurrency behavior depending on how it is routed. A Slack thread, a Telegram DM, and a webchat conversation are not automatically one queue just because the same agent handles them. They become one queue only if they resolve to the same session key.

openclaw status
openclaw sessions --json
openclaw sessions cleanup --dry-run

Use those commands when you are debugging a stuck or surprising conversation. openclaw status shows session store location and recent activity. openclaw sessions --json dumps entries so you can see what keys exist. openclaw sessions cleanup --dry-run helps inspect maintenance impact without mutating state.

If your agent seems to be “randomly” waiting, first prove whether two messages are sharing a session. Do not tune queue modes before you know the lane.

Current default queue behavior favors steering

The current public queue docs show default inbound queue behavior as mode: "steer", debounceMs: 500, cap: 20, and drop: "summarize". Steering means a prompt that arrives mid-run is injected into the active runtime when the run can accept it, rather than starting a second session run.

That default is a good fit for live operator conversations. If the human sends “wait, preview first” while the agent is still running, steering gives the active run a chance to incorporate that correction after the current tool boundary. If steering is unavailable, OpenClaw waits until the active run ends before starting the prompt.

{
  messages: {
    queue: {
      mode: "steer",
      debounceMs: 500,
      cap: 20,
      drop: "summarize",
      byChannel: { discord: "collect" },
    },
  },
}

The important part is not that every channel should use exactly this config. The important part is that you choose queue behavior deliberately. For some surfaces, collect is better because users often split one thought across several messages and you want one followup response. For others, steer is better because mid-run correction matters more than batching.

Pick queue modes by user experience

The docs list the main modes clearly:

steer: inject into the active runtime when possible. If the run cannot be steered, wait for the current run to finish before starting the prompt.
followup: do not steer; enqueue each message for a later turn after the current run ends.
collect: do not steer; coalesce queued messages into a single followup after the quiet window, while preserving routing when messages target different channels or threads.
interrupt: abort the active run for that session and run the newest message.
steer-backlog or steer+backlog: supported in the runtime schema and command surface; steer now while preserving backlog semantics. Use it only when you understand the duplicate-looking response risk.

My default operator rule is this: use steer where course correction is important, collect where humans send fragments, followup where every message is meant to become a distinct turn, and interrupt only when aborting active work is the desired behavior.

/queue steer
/queue collect debounce:0.5s cap:25 drop:summarize
/queue followup
/queue default
/queue reset

Per-session queue commands are useful during live work because they avoid turning one noisy thread into a global policy change. Send them as standalone commands. Keep queue changes boring and explicit.

Queue discipline is one of those boring layers that saves real money when agents touch production. Want the full operating checklist? Get ClawKit — $9.99.

Debounce and caps are part of trust

debounceMs is the quiet window before queued delivery. In steer mode, the docs call out that it also sets the Codex steering quiet window before sending a batched turn/steer. Bare numbers are milliseconds, and the slash command accepts units such as ms, s, m, h, and d.

cap limits queued messages per session. drop controls overflow. The documented policies are summarize, old, and new. summarize drops the oldest queued entries as needed while keeping compact summaries that can be injected as a synthetic followup prompt.

That is not just resource hygiene. It is user trust. Without a cap, a noisy channel can become unbounded backlog. Without a sensible drop policy, the agent can lose important intent silently. Without a debounce window, you turn normal human typing into too many small agent turns.

Queueing is not retrying

The retry docs solve a different layer. Retries apply per HTTP request, not per multi-step flow. That preserves ordering and avoids duplicating non-idempotent operations. Telegram retries transient failures such as 429, timeouts, connection resets, closed sockets, and temporary unavailability, while Markdown parse errors fall back to plain text instead of retrying. Discord retries rate limits, request timeouts, 5xx responses, and transient transport failures.

{
  channels: {
    telegram: {
      retry: {
        attempts: 3,
        minDelayMs: 400,
        maxDelayMs: 30000,
        jitter: 0.1,
      },
    },
    discord: {
      retry: {
        attempts: 3,
        minDelayMs: 500,
        maxDelayMs: 30000,
        jitter: 0.1,
      },
    },
  },
}

Use that distinction in incident reviews. If a message send hit a provider 429, inspect retry policy. If two user prompts collided in the same conversation, inspect queue and session routing. If a tool action happened twice, ask whether it was a second run, a retry of a request, or an agent loop.

Those are three different failure classes. Treating them as one vague “agent repeated itself” bucket makes the fix worse.

Loop detection catches a different kind of parallel damage

OpenClaw also documents tool-loop detection for repeated tool-call patterns. The rolling-history loop detector is disabled by default, but can be enabled under tools.loopDetection. Its detectors include generic repeats, known polling with no progress, and ping-pong patterns. The docs also describe a post-compaction guard that can abort when the same tool, arguments, and result repeat immediately after a compaction retry.

{
  tools: {
    loopDetection: {
      enabled: true,
      historySize: 30,
      warningThreshold: 10,
      criticalThreshold: 20,
      globalCircuitBreakerThreshold: 30,
      detectors: {
        genericRepeat: true,
        knownPollNoProgress: true,
        pingPong: true,
      },
    },
  },
}

This is not a replacement for queue discipline. Queueing prevents two active runs from stepping on the same session. Loop detection helps when one run gets stuck repeating itself. Both protect cost and trust, but they fire at different layers.

I would enable loop detection cautiously on smaller or more experimental agents, leave thresholds conservative, and raise thresholds if legitimate repeated calls are blocked. Do not use strict loop detection as an excuse to let agents poll forever. The better pattern is still event-driven work, bounded waits, and visible blockers.

Typing indicators should not mean the run started

The queue docs note that typing indicators can fire immediately on enqueue when the channel supports them. That is good UX, but it can confuse operators. A typing indicator means the system acknowledged the inbound message. It does not necessarily mean the agent run is actively executing right now.

If a user says “it showed typing but replied late,” that may be healthy queue behavior. The right proof is verbose queue timing, session activity, and whether the final reply preserved the latest instruction. Do not debug typing presence as if it were an execution trace.

A practical queue discipline checklist

Identify the session key. Prove whether the messages share a session before changing queue mode.
Choose the mode for the surface. Use steer for corrections, collect for message fragments, followup for distinct turns, and interrupt only when aborting is desired.
Set caps and overflow. Keep cap finite and use drop: "summarize" unless you have a reason to discard old or new messages silently.
Keep retries scoped. Retry provider sends, not whole multi-step workflows.
Use loop detection for repeated tool patterns. It catches no-progress cycles, not normal queue backlog.
Verify with live signals. Use sessions, verbose logs, status, and actual replies. Do not declare queue fixes based on vibes.

The mature posture is boring: serialize per session, cap global work, steer or collect based on the channel, retry only the current request, and block no-progress loops before they burn the night.

That is how parallel agents stay useful. They do not all run everywhere at once. They run where they can be independent, wait where correctness requires it, and expose enough control that an operator can prove what happened afterward.

Want the complete guide? Get ClawKit — $9.99