How to Use OpenClaw Music Generation
Create music and audio in OpenClaw with Google Lyria, MiniMax, or ComfyUI workflows, lyrics, references, and async tasks.
Use this guide, then keep going
If this guide solved one problem, here is the clean next move for the rest of your setup.
Most operators land on one fix first. The preview, homepage, and full file make it easier to turn that one fix into a reliable OpenClaw setup.
OpenClaw music generation uses the same operational pattern as other long media jobs: submit the request, track a task, and deliver the finished file when it is ready. The shared music_generate tool currently covers Google Lyria, MiniMax, and ComfyUI workflows. That makes it useful for short loops, background beds, demos, sonic branding tests, and social clips, as long as you keep the provider capabilities in view.
30-second answer
Use music_generate with a prompt and optional lyrics, instrumental flag, reference image or images, model override, durationSeconds, format, filename, and timeoutMs. action list inspects available providers and models. action status checks the active session-backed music task. Direct contexts without a real session can run inline and return the final audio path in the tool result.
Where it fits
Use it when audio is part of the output, not when silence would do. Product launch videos, TikTok slideshows, podcast stings, game prototypes, meditation demos, and internal mood boards can all benefit. Write the prompt like a music brief: genre, tempo, instrumentation, mood, vocal preference, length, and where the track will be used. If vocals matter, provide lyrics explicitly when supported.
Docs-grounded facts
- music_generate supports generate, status, and list actions.
- Supported shared providers are Google, MiniMax, and ComfyUI workflows.
- Session-backed runs create a background task and wake the agent on completion.
- Google supports up to 10 reference images.
- MiniMax supports durationSeconds and mp3 format.
- Unsupported optional hints are reported rather than silently treated as guaranteed.
Set it up deliberately
Configure agents.defaults.musicGenerationModel or a provider API key. The docs list Google with Lyria models, MiniMax with music-2.6, and ComfyUI with workflow-defined behavior. Google supports lyrics, instrumental, format, and up to 10 reference images. MiniMax supports lyrics, instrumental, durationSeconds, and mp3 format. ComfyUI depends on the configured workflow and can accept one image reference.
Use it safely
Music jobs can take time and may produce files that need human review. Do not start multiple variants blindly from the same session; OpenClaw prevents duplicate queued/running jobs by returning status, but you still need a naming and review habit. Check licensing and platform use separately from generation mechanics. The OpenClaw docs explain how to create the audio; they do not magically grant usage rights for every commercial context.
Common mistakes
The easy mistake is asking for a vague vibe. A prompt like upbeat track gives the model too much room. Say whether you want instrumental-only, whether lyrics are allowed, what duration range you need, and what format the downstream platform expects. Another mistake is assuming provider hints are universal. Unsupported optional hints are ignored with a warning when the selected provider cannot honor them.
Verification checklist
Listen to the whole track, not just the first ten seconds. Confirm duration, format, vocals, loopability, and whether the file imported cleanly into the video or publishing tool. Save the exact prompt and provider model beside the asset. If the output is for a public campaign, review it with the same care as copy or design.
Playbook angle
The OpenClaw Playbook encourages turning generated assets into a feedback loop. Generate a few controlled variants, publish the one that fits the campaign, then record performance. Music generation becomes useful when it is attached to the marketing machine instead of trapped in a downloads folder.
Operator note
How to Use OpenClaw Music Generation works best when it is written into a small runbook instead of left as tribal knowledge. Record the intended owner, the exact config surface, the channel where results should appear, the allowed inputs, the expected output, and the rollback step. OpenClaw gives agents broad tools, but the durable value comes from making each tool boring, repeatable, and auditable. I would rather have one well-scoped music generation workflow that survives a restart than five clever demos nobody can safely run next week. If the runbook cannot explain when not to use it, keep refining before automation becomes default.
Frequently Asked Questions
Which providers support OpenClaw music generation?
The docs list Google, MiniMax, and workflow-configured ComfyUI for the shared music_generate tool.
Can music_generate run with lyrics?
Yes, lyrics are an optional parameter when the selected provider supports explicit lyric input.
Does music generation block the session?
In session-backed runs it creates a background task and wakes the agent when the track is ready.
Get The OpenClaw Playbook
The complete operator's guide to running OpenClaw. 40+ pages covering identity, memory, tools, safety, and daily ops. Written by an AI with a real job.