How to Give Your AI Agent a Browser (Web Automation with OpenClaw)

Hex · March 12, 2026 · 9 min read

An AI agent without a browser is like an employee who can't open a website. They can write, think, and talk — but the moment you say "check this page" or "log in and do X," they're stuck. Web automation has always been the missing piece.

OpenClaw fixes this with a built-in browser tool. Your agent gets a real, Chromium-based browser it can control — open tabs, take screenshots, read page content, click buttons, fill forms, and navigate flows end-to-end. All through a single browser tool that the agent calls like any other capability.

Here's how it actually works, and how to set it up.

Two Modes: Isolated Agent Browser vs. Your Existing Chrome

OpenClaw gives you two ways to hand a browser to your agent, and the difference matters.

The `openclaw` Profile — Isolated and Managed

The default mode is a fully isolated browser that OpenClaw manages. It runs as a separate Chromium instance with its own user data directory, its own CDP port, and zero overlap with your personal browser. It even gets an orange UI tint by default so you can immediately see which window is the agent's lane.

This is what you want for most automation work. The agent can log into accounts, maintain session cookies, and run long-lived browser flows — all without touching your own browser history, passwords, or tabs.

openclaw browser --browser-profile openclaw start
openclaw browser --browser-profile openclaw open https://example.com
openclaw browser --browser-profile openclaw snapshot

The `chrome` Profile — Extension Relay to Your Real Browser

The second mode drives your existing Chrome (or any Chromium-based browser) via a local relay and a Chrome extension. You install the extension, click the OpenClaw Browser Relay icon on a tab to attach it, and now the agent can control that specific tab. The extension badge turns ON to confirm it's live.

This is useful when you need the agent to operate in a session you're already logged into — no re-authentication needed. You're essentially handing the agent your steering wheel for a specific tab.

openclaw browser extension install
# Then in Chrome: enable Developer mode, Load unpacked, pin extension
# Click the extension icon on the tab you want to attach

After that, the agent uses profile="chrome" to target it. Switch back to the openclaw profile for anything that should stay isolated.

Configuration

Browser settings live in ~/.openclaw/openclaw.json. Here's a practical baseline:

{
  browser: {
    enabled: true,
    defaultProfile: "openclaw",
    headless: false,
    profiles: {
      openclaw: { cdpPort: 18800, color: "#FF4500" },
    },
  },
}

If you want to use Brave instead of Chrome:

{
  browser: {
    executablePath: "/Applications/Brave Browser.app/Contents/MacOS/Brave Browser"
  }
}

Or via CLI:

openclaw config set browser.executablePath "/Applications/Brave Browser.app/Contents/MacOS/Brave Browser"

OpenClaw auto-detects Chrome → Brave → Edge → Chromium in that order if you don't override. On Linux it looks for google-chrome, brave, microsoft-edge, and chromium in the path.

What the Agent Can Actually Do

The browser tool exposes a full automation surface to the agent. Here's what that looks like in practice:

Navigate and Read Pages

The agent can open URLs, navigate to new ones, and take a snapshot of the current page — a structured accessibility tree (or AI snapshot with numeric refs) that tells it exactly what's on screen and what's interactive.

browser action="open" url="https://example.com"
browser action="snapshot"

The snapshot comes back as a readable text representation of the page — headings, buttons, inputs, links — all with reference IDs the agent uses to target actions. No brittle CSS selectors. No XPath archaeology. The agent picks a ref from the snapshot and acts on it.

Click, Type, Fill, Submit

Once the agent has a snapshot and a ref, it can interact:

browser action="act" kind="click" ref="e12"
browser action="act" kind="type" ref="e23" text="hello@example.com"
browser action="act" kind="press" key="Enter"
browser action="act" kind="select" ref="e9" values=["Option A"]

Refs from snapshots are stable within a page load. If the page navigates or updates, the agent re-snapshots and gets fresh refs. This keeps automation deterministic even on dynamic pages.

Screenshots

The agent can capture a screenshot at any point — either the full page or a specific element. This is useful both for verification and for passing visual context to the model.

browser action="screenshot"
browser action="screenshot" fullPage="true"

Downloads and Uploads

File operations are supported with some constraints. Upload paths are scoped to /tmp/openclaw/uploads and downloads go to /tmp/openclaw/downloads — you can't point the agent at arbitrary filesystem paths, which is intentional for safety.

browser action="act" kind="upload" inputRef="e5"
browser action="act" kind="wait" loadState="networkidle"

State Manipulation

For testing or environment simulation, the agent can manipulate browser state:

Cookies: read, set, or clear session cookies
Local/session storage: get or set values directly
Network simulation: toggle offline mode, inject custom headers
Device emulation: simulate iPhone 14, custom viewports, timezones, locales
Geolocation: set fake coordinates per origin

openclaw browser set device "iPhone 14"
openclaw browser set timezone America/New_York
openclaw browser cookies set session abc123 --url "https://example.com"

Multiple Profiles for Multiple Contexts

You can define more than one browser profile, each with its own CDP port and color tint. This lets you run parallel browser contexts — an agent logged into account A in one profile and account B in another, both running simultaneously:

{
  browser: {
    profiles: {
      openclaw: { cdpPort: 18800, color: "#FF4500" },
      work:     { cdpPort: 18801, color: "#0066CC" },
      remote:   { cdpUrl: "http://10.0.0.42:9222", color: "#00AA00" },
    },
  },
}

The remote profile in that config points at a Chromium instance running on another machine entirely — useful if your agent runs on a headless server but you want the browser on a machine with a display, or for integrating cloud browser services like Browserless.

Remote Browsers and Hosted CDP (Browserless)

If you don't want to manage a local Chromium installation — especially in Docker or server environments — you can point OpenClaw at a hosted CDP endpoint. Browserless is a popular option:

{
  browser: {
    defaultProfile: "browserless",
    profiles: {
      browserless: {
        cdpUrl: "https://production-sfo.browserless.io?token=YOUR_API_KEY",
        color: "#00AA00",
      },
    },
  },
}

The agent uses it identically to a local profile. From its perspective, a browser is a browser — local or remote is a routing detail it doesn't need to care about.

Node Proxy: Browser on a Different Machine Than the Gateway

If your OpenClaw Gateway runs on a server but your browser lives on a desktop, the node proxy handles routing automatically. Run a node host on the machine with the browser, and the Gateway auto-routes browser tool calls there — zero extra browser config required.

This is the default path for remote gateway setups. Disable it per-side if you don't want it:

# On the node: nodeHost.browserProxy.enabled = false
# On the gateway: gateway.nodes.browser.mode = "off"

Debugging When Things Break

Browser automation breaks. Pages change, elements get covered, timing is unpredictable. OpenClaw has a built-in debug workflow:

Re-snapshot with interactive mode — openclaw browser snapshot --interactive gives you a flat list of every interactive element and its current ref. Use this when an action fails.
Highlight a ref — openclaw browser highlight e12 overlays a visual indicator on exactly what Playwright is targeting. Immediately reveals "covered" or "wrong element" issues.
Console and request logs — openclaw browser errors and openclaw browser requests --filter api show what's happening in the page.
Trace recording — For deep debugging, start a trace, reproduce the issue, stop the trace. The trace file can be loaded in Playwright's trace viewer for frame-by-frame inspection.

openclaw browser snapshot --interactive
openclaw browser highlight e12
openclaw browser errors --clear
openclaw browser trace start
# ... reproduce the issue ...
openclaw browser trace stop

Security Boundaries

A few things worth knowing before you point an agent at sensitive browser sessions:

SSRF protection: By default, OpenClaw allows private network destinations (trusted-network model). If you want strict public-only browsing — for example, an agent you're letting external users configure — flip dangerouslyAllowPrivateNetwork to false and add an allowlist:

{
  browser: {
    ssrfPolicy: {
      dangerouslyAllowPrivateNetwork: false,
      hostnameAllowlist: ["*.example.com"],
    },
  },
}

JavaScript evaluation: The browser act kind=evaluate command executes arbitrary JS in the page context. That's powerful — and it's a prompt injection risk. If you don't need it, disable it with browser.evaluateEnabled=false.

Browser profile = sensitive session: The openclaw profile may contain logged-in sessions. Keep the Gateway private (loopback or Tailscale). Remote CDP URLs are powerful — don't expose them publicly.

Real Use Cases Worth Building

Once your agent has a browser, a lot of workflows that felt impossible become straightforward:

Automated form submission: expense reports, vendor portals, gov sites with no API
Competitor monitoring: check pricing pages on a schedule, diff against last snapshot, alert on changes
Content scraping: pull structured data from pages that block API access
QA automation: run through critical UI flows after deploys, screenshot results
Social media management: post, reply, and engage via the actual browser UI (useful when official APIs are rate-limited or unavailable)
Research pipelines: the agent searches, reads, extracts — all hands-free

The sub-agent architecture pairs especially well here. You can spawn a dedicated sub-agent with browser access for a specific research or automation task, let it run, and get results back — while your main agent handles other work. Parallel browser automation without managing threads yourself.

Playwright Is Required for Some Features

A few capabilities — navigate/act/AI snapshots, element screenshots, PDF export — require Playwright to be installed. If you see Playwright is not available in this gateway build, install the full Playwright package (not playwright-core) and restart the Gateway.

For Docker deployments, use the bundled CLI to install browser binaries:

docker compose run --rm openclaw-cli \
  node /app/node_modules/playwright-core/cli.js install chromium

ARIA snapshots and basic screenshots work without Playwright. But for full automation — clicks, form fills, dynamic pages — you'll want it installed.

The Short Version

OpenClaw browser automation gives your agent a real, isolated Chromium instance it controls through a stable API. Snapshots replace brittle selectors. The Chrome extension relay lets it work in your existing logged-in sessions. Multi-profile support handles parallel contexts. Remote CDP and node proxy handle cross-machine setups. And a full debug toolkit means you can actually fix things when they break.

It's not a toy. It's a browser your agent actually uses — the same way you do, but autonomously and at scale.

Want the complete guide? Get The OpenClaw Playbook — $9.99