Last week, we got GPT-5.3 Codex, Gemini 3, and Claude Opus 4.6 to work together in the same coding session. Not through some glue script or orchestration layer — as actual teammates, passing messages to each other, claiming tasks from a shared list, and arguing about architecture through the same message bus.
This is agent teams: a lead AI spawns teammate agents, each with its own context window, and they coordinate through message passing. Claude Code shipped the concept in early February 2026. We built our own implementation in OpenCode — same idea, different architecture, and one thing Claude Code can't do: mix models from different providers in the same team.
Here's how we built it, what broke along the way, and where the two systems ended up differently.
How agents talk to each other
The first big decision was messaging. How do agents send messages, and how do recipients find out they have new ones?
Claude Code writes JSON to inbox files on disk — one file per agent at ~/.claude/<teamName>/inboxes/<agentName>.json. The leader polls that file on an interval to check for new messages. This makes sense for Claude Code because it supports three different spawn backends: in-process, tmux split-pane, and iTerm2 split-pane. When a teammate is a separate OS process in a tmux pane, a file on disk is the only shared surface you have.
OpenCode runs all teammates in the same process, so we don't need files for cross-process IPC. But we still wanted a clean audit trail. The solution is two layers: an inbox (source of truth) and session injection (delivery mechanism).
Every message first gets appended to the recipient's inbox — a per-agent JSONL file at team_inbox/<projectId>/<teamName>/<agentName>.jsonl. Each line is a JSON object with an id, from, text, timestamp, and a read flag. Then the message gets injected into the recipient's session as a synthetic user message, so the LLM actually sees it. Finally, autoWake restarts the recipient's prompt loop if they're idle.
// messaging.ts — simplified send flow
async function send(input) {
// 1. Write to inbox (source of truth)
await Inbox.write(input.teamName, input.to, {
id: messageId(),
from: input.from,
text: input.text,
timestamp: Date.now(),
})
// 2. Inject into session (delivery mechanism)
await injectMessage(targetSessionID, input.from, input.text)
// 3. Wake idle recipients
autoWake(targetSessionID, input.from)
}
No polling. When a teammate sends a message, the recipient processes it on the next loop iteration. The inbox doubles as an audit log — Inbox.all(teamName, agentName) gives you every message without digging through session history. When messages are marked read, markRead batches them by sender and fires delivery receipts back as regular team messages, the same pattern as actor model replies and XMPP read receipts.
The write paths differ more than you'd expect. Claude Code stores each inbox as a JSON array, so every new message means read the whole file, deserialize, push one entry, serialize, write it all back — O(N) per message. OpenCode uses JSONL, so writes are a single appendFile — O(1). The only operation that rewrites the file is markRead, and that fires once per prompt loop completion, not per message.
This puts OpenCode in the "best of both worlds" quadrant:
| Polling | Event-driven / auto-wake | |
|---|---|---|
| Inbox files | Claude Code | OpenCode |
| Session injection only | (nobody does this) | (our original design) |
The spawn problem we got wrong twice
Spawning teammates sounds simple. It wasn't.
Our first attempt was non-blocking: fire off the teammate's prompt loop and return immediately. This matched what we saw in Claude Code — the lead spawns both researchers in parallel, shows a status table, and keeps talking to the user.
The problem was that the lead's prompt loop would exit after spawning. The LLM had called team_spawn, gotten a success response, and had nothing else to say. So it stopped. Now you have teammates running with no lead to report to.
So we tried making spawn blocking — team_spawn awaits the teammate's full prompt loop completion before returning. This was worse. The lead can't coordinate multiple teammates in parallel if it's stuck waiting for the first one to finish.
The fix was neither blocking nor non-blocking. It was auto-wake. The spawn stays fire-and-forget, but when a teammate sends a message to an idle lead, the system restarts the lead's prompt loop automatically.
// Fire-and-forget with Promise.resolve().then() to guard against synchronous throws
Promise.resolve()
.then(async () => {
await transitionExecutionStatus(teamName, name, "running")
return SessionPrompt.loop({ sessionID: session.id })
})
.then(async (result) => {
await notifyLead(teamName, name, session.id, result.reason)
})
.catch(async (err) => {
await transitionMemberStatus(teamName, name, "error")
})
return { sessionID: session.id, label } // returns immediately
This went through three commits (c9702638d → 9c57a4485 → 177272136) before we got it right. The insight wasn't about blocking semantics — it was that the messaging layer needed to be able to restart idle sessions.
Why teammates talk to each other, not just the lead
Claude Code routes communication primarily through the leader. Teammates can message each other, but the main pattern is teammate → leader → teammate.
We opened this up to full peer-to-peer messaging. Any teammate can team_message any other teammate by name. The system prompt tells them:
"You can message any teammate by name — not just the lead."
In practice, this made a big difference. We ran a four-agent Super Bowl prediction team where a betting analyst proactively broadcast findings to all teammates, and an injury scout cross-referenced that data without the lead having to relay it. The lead focused on orchestration instead of being a message router.
Keeping sub-agents out of the team channel
When a teammate spawns a sub-agent (via the task tool for codebase exploration, research, etc.), that sub-agent must not have access to team messaging. Sub-agents are disposable workers that produce high-volume output — grep results, file reads, intermediate reasoning. Letting them broadcast to the team would flood the coordination channel.
We enforce this at two levels — permission deny rules and tool visibility hiding:
const TEAM_TOOLS = [
"team_create", "team_spawn", "team_message", "team_broadcast",
"team_tasks", "team_claim", "team_approve_plan",
"team_shutdown", "team_cleanup",
] as const
// Deny rules on sub-agent session:
...TEAM_TOOLS.map(t => ({
permission: t, pattern: "*", action: "deny",
}))
// Also hide the tools entirely:
tools: {
...Object.fromEntries(TEAM_TOOLS.map(t => [t, false])),
}
The teammate relays relevant findings back to the team. This was added after a security audit (commit 2ad270dc4) found that sub-agents could accidentally access team_message through inherited parent permissions. Claude Code enforces the same boundary.
Two state machines, not one
We track each teammate's lifecycle through two independent state machines. The first is coarse — five states for the overall lifecycle:
const MEMBER_TRANSITIONS: Record<MemberStatus, MemberStatus[]> = {
ready: ["busy", "shutdown_requested", "shutdown", "error"],
busy: ["ready", "shutdown_requested", "error"],
shutdown_requested: ["shutdown", "ready", "error"],
shutdown: [], // terminal
error: ["ready", "shutdown_requested", "shutdown"],
}
The second is fine-grained — ten states tracking exactly where the prompt loop is:
Why two? The UI needs to show what each teammate is doing at any moment (the execution status), but recovery and cleanup logic needs a simpler model to reason about (the member status). Collapsing these into one state machine would have made either the UI too coarse or the recovery logic too complex.
Transitions are validated against the allowed-transitions map. Two escape hatches exist: guard: true (skip if already shutdown — prevents race conditions during cleanup) and force: true (bypass validation entirely — used in recovery when the state machine may be inconsistent after a crash).
What happens when the server crashes
When the server restarts while teammates are running, you have stale state. Teammates marked as "busy" aren't actually running anymore. The recovery sequence matters, and the ordering is specific:
First, register a permission restoration handler. This must be ready before recovery because recovery could trigger cleanup, which might need to restore delegate-mode permissions on the lead session.
Second, scan all teams for busy members and force-transition them to ready. Inject a notification into the lead:
[System]: Server was restarted. The following teammates in team "X"
were interrupted and need to be resumed: worker-1, worker-2.
Use team_message or team_broadcast to tell them to continue their work.
Third, subscribe to auto-cleanup events after recovery finishes. If you subscribe before, the status transitions that recovery itself triggers would cause spurious cleanup.
The key decision: no automatic restart. Interrupted teammates get marked as ready but their prompt loops don't restart. The user has to re-engage them. This prevents runaway agents after a crash. You lose convenience, but you don't wake up to find four agents have been burning API credits all night on a stale task.
Cancellation uses a retry loop — three attempts, 120ms apart. If the prompt loop hasn't stopped after three tries, force-transition as a safety net:
for (const _ of [0, 1, 2]) {
SessionPrompt.cancel(member.sessionID)
await transitionExecutionStatus(teamName, memberName, "cancelling")
await Bun.sleep(120)
if (TERMINAL_EXECUTION_STATES.has(current?.execution_status)) break
}
What we tested
We ran three progressively complex scenarios:
NFL Research. Two Gemini agents researching team history. This is where we discovered the spawn/auto-wake problem. It also revealed a Gemini-specific issue: the model generated ~50 near-identical "task complete" messages in a loop, unable to stop. No unit test catches that.
Super Bowl Prediction. Four Claude Opus agents — stats analyst, betting analyst, matchup analyst, injury scout — working in parallel with peer-to-peer coordination. This validated the full-mesh topology and proved atomic task claiming worked under concurrent access.
Architecture Drama. GPT-5.3 Codex, Gemini 2.5 Pro, and Claude Sonnet 4 coordinating through the same message bus. Three providers, one team. Auto-wake triggered on every message. Sub-agent isolation held. Nothing broke.
What's still missing
Delivery receipts are best-effort. If the process crashes after markRead() but before the receipt is injected into the sender's session, the sender never learns the recipient read their message. The read state itself survives — it's the notification that's lost. This is the same trade-off XMPP and Matrix make. Claude Code doesn't send delivery receipts at all — markMessagesAsRead flips a local flag with no sender notification.
No backpressure. A fast sender can flood a slow receiver. There's a 10KB per-message limit but no bounded queue.
Single-process only. All locks are in-memory, so you can't run multiple server instances against the same storage. Claude Code's file-based locking works across processes — that's one advantage of their approach.
No cross-team communication. Teams are isolated. No inter-team messaging primitive.
Recovery is manual. After a crash, teammates are ready but idle. The human re-engages them. This is intentional, but it means unattended teams can't self-heal.
How it compares
Everything above, condensed:
| Dimension | Claude Code | OpenCode |
|---|---|---|
| Message storage | JSON array (O(N) read-modify-write per message) | JSONL append-only (O(1) writes) + session injection |
| Message notification | Polling | Event-driven auto-wake |
| Spawn model | Fire-and-forget (3 backends) | Fire-and-forget (in-process only) |
| Communication | Leader-centric | Full mesh (peer-to-peer) |
| Tool model | 8+ dedicated tools | 9 dedicated tools |
| State tracking | Implicit | Two-level state machine (member + execution) |
| Task management | Built-in | Built-in with dependencies + atomic claiming |
| Sub-agent isolation | Explicit | Explicit (deny list + visibility hiding) |
| Recovery | Not publicly documented | Ordered bootstrap with manual restart |
| Multi-model | Single provider | Multi-provider per team |
| Message tracking | Read/unread flag (local only, no sender notification) | Read/unread + delivery receipts to sender (reply messages) |
| Locking | File locks | In-memory RW lock (writer priority) |
| Plan approval | Present | First-class with tagged permission pattern |
| Delegate mode | Present | Lead restricted to coordination-only tools |
The systems are more similar than different. Both use fire-and-forget spawning, file-based inbox persistence, and explicit sub-agent isolation. The real divergences — event-driven messaging, append-only JSONL writes, peer-to-peer communication, multi-model support, two-level state machines — come from OpenCode's constraint of running everything in a single process and its goal of supporting multiple providers.
OpenCode is open source. The agent teams implementation spans three PRs on the dev branch: #12730 (core), #12731 (tools & routes), and #12732 (TUI).





Top comments (1)
This is one of the clearest deep‑dives. I’ve seen on real multi‑agent coordination.
Really appreciate you taking the time to break all this down. Multi‑agent coordination is one of those topics that **sounds **simple until you actually try building it. This write‑up makes the whole architecture way easier to reason about.
TL;DR for anyone reading
OpenCode rebuilt Claude Code’s agent‑team system but made it event‑driven instead of polling, added peer‑to‑peer messaging, multi‑model support, and a cleaner inbox/session‑injection flow. The result is a more flexible, single‑process architecture with better coordination and clearer state tracking. dev.to