I Migrated a 5-Agent System to a New Laptop. Here's What Broke.
Subtitle: A post-mortem on moving a multi-agent Claude Code + OpenCode setup — and what 14 cron jobs revealed about what's actually load-bearing.
I've been running a small "Agent OS" — a thin orchestration layer that runs multiple coding agents on cron, each in its own workspace, each with its own identity and memory. Not a framework. Just shell scripts and YAML.
Last week I moved the whole thing to a new laptop. Same code. Different machine. I expected "git clone and go."
That's not what happened. The code ran. The system didn't. And the gap between those two things turned out to be the interesting part of the story.
What the System Actually Is
Before the post-mortem, the 30-second tour:
- 5 agents, each a git-tracked workspace: an executive assistant, a marketing agent, a nightly system reviewer, and two test workspaces.
- ~1,300 lines of shell and Python total across the orchestrator, the channel gateway, and the dashboard. That's the entire "platform."
- 14 cron jobs defined as YAML — each one a prompt, a workspace, and a schedule. A runner script handles session management, logging, git auto-commit, and optional delivery to Slack or Telegram.
- Two backends:
claude -pandopencode run. Same orchestrator. Different adapters.
The design premise: the cheapest possible layer between cron and an LLM, with the agents themselves holding all the intelligence. No coordinator-on-top. Agents are peers; they read each other's workspaces through the filesystem.
On the old machine, this ran for weeks without me touching it. I assumed the hard part was behind me.
What Broke
I tested all 14 jobs one at a time, in order, via the real execution chain. Not "does the script parse" — "does the job produce what it's supposed to produce and deliver it where it's supposed to go." Here's what I hit:
1. yt-dlp wasn't on PATH. One job pulls a podcast transcript and summarizes it. The script assumed yt-dlp was globally installed. On the old machine it was — I'd forgotten. On the new machine it lived inside a venv at scratch/.venv/bin/yt-dlp. The code was correct. The environment it ran in wasn't portable.
2. Email syntax for a CLI tool had drifted. The same job emails the digest to me when it's done. The flag I was using (--html --body-file) had changed to --body-html "$(cat file.html)" in a newer version of the tool. Old machine had the old version pinned. New machine pulled latest. A silent API change in a dependency I wasn't even tracking.
3. A config file was missing. One agent's job referenced digest-configs.md that lived in a workspace I'd deprecated months ago. The job had been reading a file from outside its own tree, and I hadn't noticed because it kept working.
4. Every job YAML had stale paths. The old system lived under ~/.openclaw/workspace/. The new one lives under ~/my-starter/workspaces/. I'd never normalized the job files to use workspace-relative paths — they all had the old absolute prefix hardcoded. 14 separate path edits.
5. Git auto-commit silently failed. The orchestrator auto-commits after each agent run — that's the visibility trail. On a fresh machine, git user.name and user.email aren't set. Every job would run, produce output, then exit 128. The agent worked. The audit trail didn't.
6. An API quota I didn't know I was near. The analytics job calls the X API. On the new machine I found out my Free tier monthly read credits had already been exhausted — starting April 7. The job handled it gracefully (that part was actually correct), but the "everything's fine" signal from the old machine had been partly a lie. The quota had already been running out for days. I just wasn't looking.
The Pattern Behind the Pattern
None of these were bugs in the agent logic. All of them were assumptions the old machine had quietly been satisfying.
- Assumption: "my environment is my code." It isn't. Binaries, version-pinned dependencies, env vars, and out-of-tree config files were all part of the system, and none of them lived in the repo.
- Assumption: "if it committed for months, it'll commit on the next machine." Not if git identity is a per-machine setting that your orchestrator depends on.
- Assumption: "the jobs are passing, so the jobs are healthy." Graceful error handling is great — until it papers over a quota exhaustion you needed to see.
I think this is the general shape of migration pain for anything agentic. The agent is a thin shell around a model; the system is everything the shell touches. When you move the shell without moving the system around it, you learn very quickly which parts were load-bearing.
The Demo
The concrete artifact here is a dashboard I wrote to answer one question: what's actually running right now?
$ a-system/dashboard.sh
AGENTS
chillo workspace-chillo claude-code last ran 4m ago
sparc workspace-sparc opencode last ran 2h ago
sentinel workspace-sentinel opencode last ran 11h ago
CRON JOBS (14 total, 11 enabled)
✓ chillo-allin-digest daily 08:00 last: 3h ago, ok
✓ chillo-analytics-reviewer 4x/day last: 12m ago, ok
✗ chillo-x-digest hourly disabled (API quota)
...
CHANNELS
slack connected
telegram connected
Nothing fancy. 243 lines of shell. But during the migration, this was the thing that told me, in one screen, which of 14 jobs were actually working — not which were configured, which were passing end-to-end.
If I were starting over, I'd build the dashboard first. Before the agents. Before the cron. Before anything. Because the question you actually need answered, repeatedly, is not "did I wire it up" but "is it still wired up."
What I'm Taking From This
Three things, if you're building something similar:
- Treat "it runs on my machine" as an unverified claim, not a ground truth. Even for scripts you wrote yourself. Especially for scripts that depend on tools you installed ad hoc.
- Auto-commit, auto-log, auto-deliver — but also auto-check. Graceful error handling is half the job. The other half is surfacing the error somewhere you'll actually see it. A job that fails quietly for three days is worse than one that crashes loudly on day one.
- Write the dashboard before you need it. When something breaks during a migration at 11pm, you don't want to be grep-ing log files. You want one command that tells you what's broken.
The system runs now. All 14 jobs tested, 11 enabled, 3 deliberately off. But the thing I'm keeping from this week isn't the fix list — it's the reminder that every agent system is only as portable as the assumptions it doesn't know it's making.
The orchestrator, gateway, and dashboard code are in a repo I'm cleaning up — around 1,300 lines of shell and Python, no framework, no magic. Will share when it's ready.