How I Stopped Managing My AI Agents and Let One of Them Do It
What happened when I gave a PM role to an AI agent with Linear, Docker sandboxes, and merge permissions.
I used to manage my AI coding agents myself. Define the task, dispatch the agent, review the PR, test it, merge it. Repeat.
It worked. But I was still the bottleneck. Every ticket needed me to kick it off. Every PR needed me to review. I was a project manager pretending to be hands-off.
So I tried something different: I gave one agent the PM job.
Meet Forge
Forge is an AI agent that runs on OpenClaw. Its job isn't to write code — it's to manage the agents that do.
Here's what Forge's Linear board looks like right now:
Five completed projects. Two in backlog. Each one went through the same lifecycle:
Refinement → Dev Agent (Docker) → Code Review → QA Agent → Internal Review → My Review → Ship
Every piece of work gets a Linear ticket. Forge dispatches Claude Code agents inside isolated Docker containers — one agent per ticket, one PR per ticket. A separate QA agent tests each PR. Before presenting a milestone to me, Forge does its own internal review — opens the live product and tries to break things.
What I Actually Do Now
Define the milestone scope. Make architecture calls on the hard stuff. Walk away.
I come back for milestone review. That's it.
I don't push. I don't check if tickets are moving. I don't context-switch between "coding" and "managing." Forge runs the process and comes back when it's ready.
What Surprised Me
Process discipline beats my judgment at the tactical level. Forge follows the lifecycle every single time. When I managed agents myself, I'd skip QA on "easy" tickets, batch PRs together, forget to test live before moving on. Forge doesn't have those impulses. It just follows the steps. That's not intelligence — it's consistency. And consistency ships more reliably than brilliance.
The PM layer was the hardest part to build. Dev agents are close to commodity — give them clear specs and isolation, they produce working code. But knowing when to refine vs execute, when to push back on scope, when something needs review vs is ready to present — that took the most iteration. Forge's operating instructions went through more revisions than any codebase.
Agents fail at infrastructure, not intelligence. The agents rarely write bad code. They fail because OAuth tokens expire, Docker daemons hang, the config points to the wrong repo. One day Forge spent $13 in wasted agent runs because of two bugs: a stale API key and a Bearer prefix that Linear doesn't accept. The fix took 2 minutes. The debugging took hours. The boring stuff is what breaks.
What Shipped
In the last month, through this workflow:
- OhMyDoc (97%) — AI resume formatter. 4 milestones, SEO polish, XML editor, ATS badges.
- Voice Assistant PWA (100%) — Dual-mode UI, local wake word, Cantonese support.
- Transaction Sync (100%) — Self-hosted finance categorizer.
Each milestone went through the full lifecycle. Forge caught its own bugs — wrong repo dispatches, service worker cache issues, model file overrides in two places. It logged the mistakes and adjusted.
The Honest Part
This isn't magic. It's a lot of upfront work defining what "good process" looks like — in plain text instructions that an LLM can follow. Forge's operating doc is longer than most of the codebases it manages.
And I still catch things Forge doesn't. Edge cases in the UX. Priorities that only make sense with business context. The feeling that something "isn't right" even when tests pass.
But the default mode shifted. I used to code and occasionally manage. Now I review products and occasionally weigh in on architecture. On my own side projects.
That's weird. And I'm still getting used to it.
Setup
If you're curious about the stack:
- Forge runs on OpenClaw (agent runtime)
- Linear for issue tracking (free tier)
- DevLab dispatches Claude Code agents in isolated Docker containers
- One ticket = one container = one PR. No batching.
The key design choice: Forge never writes code itself. It's purely a coordinator. Spec, dispatch, review, decide. Same reason a human PM shouldn't write the code they're managing.
Part of an ongoing experiment building with AI agents. More at getjustgo.com.