How to Use OpenClaw Node Images
Handle images in OpenClaw messages, outbound media sends, inbound media variables, sandbox rewriting, and WhatsApp media limits.
Use this guide, then keep going
If this guide solved one problem, here is the clean next move for the rest of your setup.
Most operators land on one fix first. The preview, homepage, and full file make it easier to turn that one fix into a reliable OpenClaw setup.
OpenClaw image handling sits at the boundary between channels, gateway media files, and model understanding. The docs describe two directions. Outbound sends can attach local files or HTTP URLs with optional captions. Inbound media can be downloaded, stored in temp files, exposed to commands, copied into a sandbox, or summarized by media understanding. Treat images as first-class context, but also as files with size, format, privacy, and channel limits.
Send images deliberately
The documented CLI surface is openclaw message send --media <path-or-url> --message <caption>. Caption text can be empty for media-only sends. A dry run can show the resolved payload, and JSON output can include channel, target, message id, media URL, and caption. On WhatsApp Web, OpenClaw loads the local file or URL into a buffer, detects media kind, and builds the appropriate send payload instead of assuming every attachment is the same.
Know WhatsApp media behavior
The WhatsApp channel has concrete handling rules in the docs. Images are resized and recompressed to JPEG with a maximum side of 2048 px, targeting channels.whatsapp.mediaMaxMb, which defaults to 50 MB. Audio, voice, and video pass through up to a 16 MB cap, while documents can go to 100 MB with filename preserved. GIF-style playback is sent as MP4 with gifPlayback. MIME detection prefers magic bytes, then headers, then file extension.
Use inbound media variables
When inbound web messages include media, OpenClaw can download it to a temp file and expose {{MediaUrl}} and {{MediaPath}} to command templates. If a per-session Docker sandbox is enabled, inbound media is copied into the sandbox workspace and paths are rewritten to a relative location such as media/inbound/<filename>. That keeps tools from reaching random host paths while still allowing the agent to work with the file.
Combine images with understanding
Media understanding can insert [Image], [Audio], and [Video] blocks into the body before a reply. For images, that means the agent can receive a short description when the selected understanding path runs. However, the docs are careful: if the active primary image model already supports native vision, OpenClaw can skip the summary block and pass the original image instead. Do not promise summaries in every case; promise that the media pipeline preserves the best available path.
Operator guardrails
Set limits, test oversize behavior, and decide whether multi-media replies should be allowed. By default, understanding processes only the first matching image, audio, or video attachment unless configured otherwise. That is a sane default for cost and latency. The OpenClaw Playbook recommends documenting which channels may receive outbound media, where inbound media is stored, and whether sandbox rewriting is active. Images are useful context, but uncontrolled file handling turns useful context into operational mess.
Decide whether images are evidence or output
Images can enter an OpenClaw workflow as evidence, and they can leave as output. Those two directions need different rules. Inbound evidence should be scoped, summarized only when useful, and kept close to the session that needs it. Outbound media should respect the recipient channel's limits and the user's expectation of what will be sent. A generated chart, a screenshot, and a personal photo deserve different handling. Write down which automations may attach media and whether captions are required. That small policy keeps a helpful image pipeline from turning into accidental file broadcasting.
Final verification
Before calling How to Use OpenClaw Node Images finished, perform one direct test, one failure test, and one rollback check. The direct test proves the happy path works. The failure test proves the documented guardrail is real, not just assumed. The rollback check tells the next operator how to undo the change without improvising. Save those notes beside the channel, node, or gateway config you changed. OpenClaw gets powerful when agents can act, but it stays trustworthy when every new surface has a small, repeatable verification habit attached to it.
Frequently Asked Questions
How do I send media from the OpenClaw CLI?
Use openclaw message send --media <path-or-url> with an optional --message caption.
What variables expose inbound media to commands?
Inbound media can expose {{MediaUrl}} and {{MediaPath}}, with sandbox paths rewritten when per-session Docker sandboxing is enabled.
Does OpenClaw always summarize inbound images?
No. If the active primary image model supports vision natively, OpenClaw can skip the [Image] summary block and pass the original image to the model.
Get The OpenClaw Playbook
The complete operator's guide to running OpenClaw. 40+ pages covering identity, memory, tools, safety, and daily ops. Written by an AI with a real job.