AI Agent Code Review: Using OpenClaw for Automated PR Reviews

Hex · March 21, 2026 · 9 min read

Most teams treat code review as a bottleneck. PRs sit open for hours waiting for a human reviewer. Context switches pile up. Junior devs get feedback late, sometimes after they've already moved on. Senior devs spend time catching obvious mistakes that automation could handle in seconds.

OpenClaw changes this. You can set up an AI agent that reviews every pull request automatically — leaving inline comments, catching logic errors, flagging security issues — while your human reviewers focus on the architectural decisions that actually need their attention.

This isn't a gimmick. When properly configured, an AI code reviewer working through OpenClaw's ACP runtime handles the mechanical work of review at a level that catches things humans miss.

How OpenClaw Handles Code Review

OpenClaw uses its ACP (Agent Client Protocol) system to connect to external coding harnesses like Claude Code, Codex, OpenCode, and Gemini CLI. These aren't simple wrappers — they're full coding agents that can read files, run tests, understand diffs, and produce structured output.

For code review, the flow looks like this:

A cron job or webhook fires when a new PR opens
OpenClaw's main agent picks it up and spawns an ACP sub-session pointed at the codebase
The coding harness (e.g. Codex or Claude Code) reads the diff, traces affected functions, runs any relevant tests
It generates review feedback — in your Slack, Discord, or directly as GitHub comments via the gh CLI
The main agent logs the review and notifies the PR author

The key insight: the coding harness runs against the actual codebase, not just the diff. It can trace a changed function back to its callers, check test coverage gaps, and understand architectural context. This is substantively different from static analysis tools.

Setting Up ACP for Code Review

First, make sure the ACP plugin is installed and enabled:

openclaw plugins install acpx
openclaw config set plugins.entries.acpx.enabled true
openclaw gateway restart

Then verify your ACP configuration is ready. The minimum config you need in ~/.openclaw/openclaw.json:

{
  "acp": {
    "enabled": true,
    "backend": "acpx",
    "defaultAgent": "codex",
    "allowedAgents": ["codex", "claude", "gemini"],
    "maxConcurrentSessions": 4,
    "runtime": {
      "ttlMinutes": 60
    }
  }
}

Run /acp doctor in your agent chat to confirm the backend is healthy before wiring up automation.

Permissions: The Critical Setting

ACP sessions run non-interactively. There's no TTY for approval prompts. If your permissions aren't configured correctly, review sessions will fail silently or crash mid-run.

For code review, you need read access to the codebase and the ability to run shell commands (tests, gh CLI, etc.). Set this in plugin config:

openclaw config set plugins.entries.acpx.config.permissionMode approve-all
openclaw config set plugins.entries.acpx.config.nonInteractivePermissions deny
openclaw gateway restart

Using nonInteractivePermissions=deny means if something unexpected tries to prompt for approval, the operation is silently skipped rather than crashing the entire session. approve-all handles the expected reads and test runs without interruption.

Note: Only use approve-all in repos you control. Don't run this against untrusted codebases with arbitrary code execution in CI hooks.

The Review Cron: Triggering on New PRs

The cleanest way to hook this up is with OpenClaw's cron system combined with the GitHub CLI. Here's the pattern I use:

Add a cron job that checks for open PRs without an AI review label:

# In your agent chat or via the cron tool:
# Schedule: every 15 minutes
# Payload: systemEvent or agentTurn depending on your setup

The agent task prompt for each review run:

Check for open PRs in [owner/repo] that haven't been reviewed yet:
1. Run: gh pr list --repo [owner/repo] --state open --json number,title,author,labels
2. Filter out PRs already labeled "ai-reviewed"
3. For each unreviewed PR:
   - Spawn an ACP session (runtime: "acp", agentId: "codex") 
   - Task: review the diff at PR #[number], check for bugs, security issues, 
     obvious mistakes, test gaps. Output structured feedback.
   - Post the feedback as a PR comment via: gh pr comment [number] --body "[feedback]"
   - Add label: gh pr edit [number] --add-label "ai-reviewed"

The main agent orchestrates — it doesn't do the review itself. It spawns a Codex or Claude Code ACP session pointed at the right working directory, with the right task. The sub-session does the deep work.

Thread-Bound Sessions for Interactive Review

Beyond automated batch reviews, you can set up thread-bound ACP sessions for interactive review. This is useful when a developer wants to discuss a specific PR with the AI directly.

In Discord or Telegram, the flow is:

Developer says "review PR #42 in detail"
Main agent spawns a thread-bound ACP session: /acp spawn codex --mode persistent --thread auto
The session binds to the thread
Developer can ask follow-up questions in the thread: "Why did you flag that function?", "What's the safest fix?"
Codex responds with full codebase context

To enable this on Discord:

{
  "channels": {
    "discord": {
      "threadBindings": {
        "enabled": true,
        "spawnAcpSessions": true
      }
    }
  }
}

Now any developer on your team can spin up an AI code reviewer in a Discord thread, have a real conversation about the PR, and close the session when done.

Choosing Your Review Harness

OpenClaw supports several ACP harnesses. They're not equivalent for code review:

Codex — Fast, good at structural issues and test gaps. Best for high-volume automated review on smaller PRs.
Claude Code (via claude harness) — Better at complex reasoning, security analysis, and explaining why something is a problem. Worth the extra cost for larger or higher-stakes PRs.
Gemini CLI — Good for repos with large context requirements; handles long files well. Use via gemini harness.
OpenCode — Emerging option, worth testing if you're already on OpenRouter.

You can set per-session model overrides with /acp model anthropic/claude-opus-4-5 if you want to escalate specific PRs to a more capable model.

What the Agent Actually Reviews

A well-prompted code review ACP session will look at:

Logic errors — off-by-one bugs, null dereferences, incorrect condition logic
Security issues — SQL injection patterns, unvalidated input, hardcoded secrets, unsafe deserialization
Test coverage — functions changed but not tested, edge cases not covered by the diff
Breaking changes — API surface changes that affect callers not in the diff
Code style — deviations from patterns already in the codebase (the AI can read your existing code to infer style)

What it won't catch well: subjective architectural debates, product-level decisions, performance issues that require profiling data, and anything requiring business context it hasn't been given.

Be explicit in your prompts about what you want prioritized. "Focus on security and correctness; skip style nits" produces better reviews than a generic "review this PR".

The Prompt That Actually Works

After a lot of iteration, here's the review prompt structure that produces consistently useful output:

You are reviewing PR #{number} in {repo}.
Working directory: {absolute_path_to_repo}

Steps:
1. Run: git fetch origin && git checkout {pr_branch}
2. Run: git diff main...HEAD
3. Read any changed files in full (not just the diff)
4. Identify: 
   - Bugs or logic errors
   - Security vulnerabilities
   - Missing tests for changed behavior
   - Functions that callers depend on that may break
5. Output review as:
   **Summary:** [1-2 sentence overview]
   **Issues:** [list, severity: critical/major/minor]
   **Suggestions:** [optional improvements]
   **Verdict:** Approve / Request Changes / Needs Discussion

Do not nitpick style. Focus on correctness and safety.

Monitoring and Logging

Every ACP session generates logs in the OpenClaw session store. You can check session history with /acp sessions and review completed runs with sessions_history.

For ops visibility, I log every review to a daily memory file:

# In the post-review agent task:
# Append to memory/YYYY-MM-DD.md:
# - [HH:MM] 🔍 Code review: PR #{number} "{title}" → {verdict}

Over time, this gives you a readable record of what the AI caught, which PRs it flagged, and how often humans agreed with its verdicts. That feedback loop is how you tune the prompts.

Keeping It from Getting Noisy

The biggest failure mode for automated code review is alert fatigue. If the AI comments on every PR with a wall of minor style suggestions, developers learn to ignore it.

A few guardrails that help:

Set a minimum severity threshold — only surface critical and major issues automatically, let the developer ask for minor feedback if they want it
Use the "ai-reviewed" label aggressively — once reviewed, don't re-review the same PR unless new commits land
Route reviews to a dedicated channel or thread rather than the main dev chat
Give the AI a concise output format — a five-line review that gets read beats a fifty-line review that gets skimmed

A Real-World Example

Here's what this looks like in practice on my own setup:

I have a cron job that fires every 30 minutes. It asks my main OpenClaw agent to check for open PRs in the active repos. When it finds one, it spawns a Codex ACP session with the repo as the working directory. Codex runs the diff, reads the changed files, and produces a structured review. The main agent posts that review as a GitHub comment, adds the label, and pings the PR author in Slack.

The whole cycle takes about 90 seconds per PR. My team now uses AI review as a first pass — by the time a human reviews, the obvious stuff is already caught and addressed. Human review time dropped noticeably because reviewers aren't explaining the same basic mistakes over and over.

The AI misses things. It doesn't understand every architectural decision. But it never skips a PR because it's tired, distracted, or context-switching between five other things.

Getting Started

If you want to try this today:

Install and enable the acpx plugin
Configure permissions for non-interactive mode
Test with a manual sessions_spawn call targeting a specific PR
Once it's working, wire it to a cron job
Tune the prompt based on the first week's output

The setup takes an afternoon. The payoff is a code reviewer that works every day, never needs to be reminded, and gets incrementally better as you tune its prompts.

Related: if you're building out a full autonomous development workflow, check out the guides on sub-agent delegation and OpenClaw cron jobs — those patterns combine directly with what's described here.

Want the complete guide? Get The OpenClaw Playbook — $9.99