Multi-Agent Orchestration

The orchestration problem

Running one AI agent is straightforward. You spawn a claude -p process, give it a prompt, and wait for the result. Running 15 is an engineering challenge. Suddenly you need to answer a set of questions that never came up when you were running one:

Which agents should run right now? Not all agents have work to do at any given moment. Your reply-handler agent has nothing to do if there are no new replies. Your competitor-watch agent only needs to run every few hours. Running agents that have no work wastes time and context window.
How do you run them in parallel without interference? If two agents are writing to the same file or talking to the same API, things break. You need isolation.
How do you track budgets so a runaway agent doesn't exhaust resources? An agent stuck in a loop could run forever. Even on an unlimited subscription, you want guardrails.
How do you know which agents are healthy vs failing? When you have 15 agents running on a schedule, you need visibility. Which ones succeeded? Which ones errored out? Which ones are degraded?
How does each agent get the right context? Each agent has a different job. Dumping the entire world state into every prompt is wasteful and confusing. Each agent needs context tailored to its role.

This is the orchestration problem. The solution is an orchestrator — a program that manages the lifecycle of all your agents. It decides who runs, spawns them, feeds them context, tracks their health, and collects their results. Think of it as the conductor of an orchestra. The individual musicians (agents) are talented, but without someone coordinating them, you get noise instead of music.

The subprocess model

The architecture is simple and deliberate: each agent is a separate claude -p process. Not threads, not workers, not function calls within a single program — full OS-level processes with their own stdin, stdout, and stderr.

Why subprocesses? Four reasons:

Isolation — One agent crashing does not take down the others. If your outreach agent hits an unrecoverable error, the content-writer and competitor-watch agents keep running. Each process has its own memory space, its own file handles, its own exit code.
Parallelism — These are real parallel processes. Your operating system schedules them across CPU cores. This is not cooperative multitasking where one agent has to yield before another can run. They all execute simultaneously.
Simplicity — Each agent is just a CLI invocation with flags. There is no agent framework, no plugin system, no custom runtime. If you can run claude -p "do something" in your terminal, you can spawn it from code.
Cost — On a Claude Max subscription, every invocation costs $0 per token regardless of how many agents you run. You are not paying per API call. You are paying a flat monthly fee. This changes the economics completely — you can run 15 agents every hour without worrying about a surprise bill.

Here is the spawning code from the orchestrator:

// Spawning a single agent — orchestrator.ts
export async function runAgentViaCLI(agentId: AgentId): Promise<AgentRunResult> {
  if (!canAgentAct(agentId)) {
    return { success: false, error: "Budget exhausted or disabled" };
  }

  // Build context for this specific agent
  const awareness = buildAwarenessIndex();
  const ctx = buildAgentContext(agentId, awareness);
  const systemPrompt = getAgentPrompt(agentId, awareness, ctx.memory,
    ctx.skills, ctx.talkspace, ctx.performanceDigest);

  // Write prompt to temp file (too long for CLI args)
  const promptPath = `/tmp/agent-prompt-${agentId}.txt`;
  writeFileSync(promptPath, systemPrompt);

  const claudeArgs = [
    "-p", userPrompt,
    "--append-system-prompt", readFileSync(promptPath, "utf-8"),
    "--mcp-config", MCP_CONFIG_PATH,
    "--max-turns", "25",
    "--dangerously-skip-permissions",
    "--output-format", "stream-json",
    "--model", "sonnet",
  ];

  const proc = spawn("claude", claudeArgs, {
    env: { ...cleanEnv, AGENT_ID: agentId },
    cwd: PROJECT_ROOT,
    stdio: ["pipe", "pipe", "pipe"],
  });

  // ... handle stdout, stderr, exit code
}

Every line here is a decision. Let's walk through the important ones:

--max-turns 25 — This is your circuit breaker. Without a turn limit, an agent that gets confused could loop indefinitely — reading files, thinking, reading more files, thinking more. Twenty-five turns is enough for any real task. If an agent cannot finish in 25 turns, something is wrong and you want it to stop.
--dangerously-skip-permissions — In interactive mode, Claude Code asks for permission before editing files or running commands. In an autonomous system, there is no human to click "approve." This flag tells Claude Code to act without asking. The name is intentionally scary — you should only use it when you have other guardrails in place (like the budget system we will cover below).
--output-format stream-json — Instead of plain text output, the agent streams structured JSON. Each line is a JSON object describing what the agent is doing — tool calls, reasoning, results. This lets the orchestrator parse the output programmatically and track what happened during the run.
AGENT_ID env var — This gets passed to the MCP server. The MCP server uses it to allowlist which tools each agent can access. Your content-writer agent does not need access to the email-sending tool. Your outreach agent does not need access to the file-editing tools. The env var is how the MCP server knows which agent is calling.
--append-system-prompt — The system prompt for each agent is long. It includes the agent's role, its current context, recent memory, available skills, and messages from other agents. That does not fit in a -p argument. Instead, we pass it as an appended system prompt that gets injected alongside Claude's default instructions.

Smart scheduling with shouldAgentRun()

Not every agent has work every cycle. Your reply-handler has nothing to do if nobody replied to your emails. Your competitor-watch agent already ran an hour ago and does not need to run again. Your LinkedIn prospector should not be active at 2 AM.

Running agents that have no work is wasteful. It consumes time, it uses up context window on nothing, and it adds noise to your logs. The solution is a scheduling function that checks whether each agent actually has a reason to run.

// Smart scheduling — orchestrator.ts
export function shouldAgentRun(agentId: string): { run: boolean; reason: string } {
  switch (agentId) {
    case "outreach": {
      const records = listOutreach();
      const pending = records.filter((r) =>
        ["new", "email-1-sent", "email-2-sent"].includes(r.status)
      );
      const promotable = visitors.filter((v) =>
        v.score >= 30 && v.email && !outreachEmails.has(v.email)
      );
      if (pending.length === 0 && promotable.length === 0) {
        return { run: false, reason: "No pending leads and no promotable visitors" };
      }
      return { run: true, reason: `${pending.length} leads pending` };
    }

    case "reply-handler": {
      const unreplied = records.filter((r) =>
        r.status === "replied" && !r.replyResponseSent
      );
      if (unreplied.length === 0) return { run: false, reason: "No unhandled replies" };
      return { run: true, reason: `${unreplied.length} replies to handle` };
    }

    case "competitor-watch": {
      const fourHoursAgo = Date.now() - 4 * 60 * 60 * 1000;
      if (agent?.lastRunAt > fourHoursAgo)
        return { run: false, reason: "Ran less than 4 hours ago" };
      return { run: true, reason: "Due for competitor check" };
    }

    case "linkedin-prospector": {
      const hour = new Date().getHours();
      if (hour < 8 || hour >= 22)
        return { run: false, reason: "Outside business hours" };
      return { run: true, reason: "Due for LinkedIn prospecting" };
    }
    // ... other agents
  }
}

There are three scheduling patterns here, and most multi-agent systems end up using some combination of all three:

Data-driven scheduling

The outreach agent and reply-handler agent only run when there is data to process. The outreach agent checks: are there pending leads in the pipeline, or visitors with high enough scores to promote into leads? If no, skip. The reply-handler checks: are there replies that have not been responded to? If no, skip.

This is the most efficient pattern. The agent literally does not run unless there is work waiting for it. No wasted cycles, no empty runs cluttering your logs.

Time-based scheduling

The competitor-watch agent runs on a cadence — every 4 hours. It does not check for new data because its job is to go find new data. The marketing-strategist runs every 2 hours to review performance and adjust strategy.

This works for agents whose job is proactive rather than reactive. They are not responding to events — they are scanning the world for information.

Hybrid scheduling

The LinkedIn prospector combines both: it only runs during business hours (8 AM to 10 PM) and on a 3-hour cadence within that window. There is no point in prospecting on LinkedIn at 3 AM — nobody is online, and the activity would look suspicious. The time window keeps it realistic, and the cadence keeps it from running every single cycle during business hours.

The shouldAgentRun() function returns both a boolean and a reason string. The reason gets logged, so when you look at your orchestrator logs, you see exactly why each agent was skipped or triggered. This is crucial for debugging. When something is not working, you need to know whether the agent never ran (scheduling issue) or ran and failed (agent issue).

The agent registry

The agent registry is the source of truth for the state of every agent in your system. It is a JSON file that the orchestrator reads at the start of each cycle and writes back after each run. Here are the types:

// Agent registry types — agent-registry.ts
export interface AgentStatus {
  id: AgentId;
  name: string;
  description: string;
  enabled: boolean;
  lastRunAt: number;
  lastRunDuration?: number;
  dailyBudget: number;
  dailyUsed: number;
  totalActions: number;
  totalErrors: number;
  recentActions: AgentAction[];
  health: "healthy" | "degraded" | "error" | "idle";
}

export interface AgentRegistry {
  agents: Record<AgentId, AgentStatus>;
  globalDailyBudget: number;
  globalDailyUsed: number;
  lastResetDate: string; // YYYY-MM-DD — resets counters daily
}

Let's break down what this tracks and why each field matters:

enabled — A kill switch per agent. If an agent is misbehaving, you set enabled: false and it will not run on the next cycle. No code change required, no redeployment. Just flip the flag in the JSON file.
dailyBudget / dailyUsed — This is loop protection, not cost control. On a Max subscription, you are not paying per token. But an agent stuck in a loop could run 500 times in a day, each time consuming 25 turns of context and tool calls. The daily budget caps how many times an agent can run. Think of it as a fuse — it blows before things get out of hand.
health — Four states: healthy (last run succeeded), degraded (last run had warnings or partial failures), error (last run failed), idle (agent has not run recently). Updated automatically after each run. This gives you a dashboard-level view of your system.
recentActions — The last 50 actions this agent took, with timestamps and results. Actions include things like "sent email to john@example.com," "wrote blog post draft," or "updated competitor analysis." This is your audit trail.
lastResetDate — The daily counters (dailyUsed, globalDailyUsed) reset at midnight. The orchestrator checks this date at the start of each cycle and resets if it is a new day.

The budget check is simple and runs before every agent invocation:

export function canAgentAct(agentId: AgentId): boolean {
  const registry = getRegistry();
  const agent = registry.agents[agentId];
  if (!agent || !agent.enabled) return false;
  if (agent.dailyUsed >= 999) return false;   // runaway protection
  if (registry.globalDailyUsed >= 9999) return false;
  return true;
}

Three checks, three levels of protection. Is the agent enabled? Has it exceeded its own daily budget? Has the entire system exceeded the global daily budget? If any check fails, the agent does not run. The numbers (999 per agent, 9999 global) are deliberately high — they are not meant to be hit during normal operation. They are there to catch runaway loops, not to throttle normal work.

This is a pattern worth stealing for any autonomous system: cheap, simple guardrails that you almost never hit but that save you when things go wrong.

The context builder pipeline

Here is where orchestration gets interesting. Each agent needs context to do its job — but not the same context, and not all of it. Your content-writer needs to know about recent website analytics and what topics are trending. It does not need to know about the outreach pipeline or LinkedIn prospecting status. Your reply-handler needs to know about recent email replies and the outreach history. It does not need to know about competitor analysis.

The context builder constructs a tailored context payload for each agent:

// The master context builder — context-builder.ts
export function buildAgentContext(agentId: string, awareness: AwarenessIndex) {
  return {
    awareness: buildSmartAwareness(agentId, awareness),
    memory: buildSmartMemory(agentId),
    skills: buildSmartSkills(agentId),
    talkspace: buildTalkspaceDigest(agentId),
    performanceDigest: buildPerformanceDigest(agentId),
  };
}

Five layers of context, each one filtered for relevance. Let's go through them.

Smart Awareness

The awareness layer tells the agent about the state of the system — but only the parts it needs to know about. Instead of dumping all 15 agents' statuses into the prompt, it shows only the collaborator agents. Your content-writer sees the status of the SEO analyst and the marketing strategist (because their work directly feeds into content decisions) but not the reply-handler or LinkedIn prospector.

Events are filtered the same way. The system generates events constantly — "new visitor from Google," "email opened by lead," "competitor launched new feature." Each agent has a config map that declares which event types it cares about. The awareness builder uses that map to filter.

Smart Memory

Each agent has its own persistent memory — lessons learned from past runs, active goals, known strengths and weaknesses, and recent research findings. The memory builder pulls the most relevant entries. An agent's top lessons (things it learned from mistakes) always get included. Active goals get included. Ancient research findings from two weeks ago get pruned.

Smart Skills

Agents have skills — executable Python scripts that give them specific capabilities like lead scoring, churn prediction, or A/B test analysis. The skills layer tells the agent what skills are available to it, including both its own skills and shared skills from other agents. Each skill listing includes its name, description, rating, and how to invoke it.

Talkspace Digest

Agents communicate with each other through shared message channels (called talkspaces). The content-writer and SEO analyst share a channel for coordinating on keyword strategy. The outreach agent and reply-handler share a channel for coordinating on lead communication. The talkspace digest pulls recent messages from the channels this agent belongs to, so it knows what its collaborators have been saying.

Performance Digest

How has this agent's work been performing? The performance digest shows metrics relevant to the agent's domain. For the content-writer: page views, time on page, bounce rate for recent articles. For the outreach agent: open rates, reply rates, conversion rates. This feedback loop is what lets agents improve over time — they can see what worked and what did not.

Why "smart" filtering matters

A naive approach would be: just dump everything into every agent's prompt. All 15 agents' statuses. All events. All memory. All messages. This fails for two reasons. First, it wastes context window. Claude has a large context window, but it is not infinite, and the more irrelevant noise you pack in, the worse the signal-to-noise ratio gets. Second, it confuses agents. An agent that sees 14 other agents' statuses and hundreds of irrelevant events will waste turns trying to process information that has nothing to do with its job.

The context builder solves this with config maps: AGENT_COLLABORATORS defines which agents each agent works with, AGENT_EVENT_INTERESTS defines which event types each agent cares about, and AGENT_CHANNELS defines which talkspace channels each agent belongs to. These maps are defined once in the configuration and used by every layer of the context builder.

Parallel execution

With scheduling, budgets, and context all handled, the actual execution is surprisingly straightforward. Here is the batch runner:

// Running agents in parallel — orchestrator.ts
export async function runAgentBatch(agentIds: AgentId[]): Promise<Record<string, AgentRunResult>> {
  const results: Record<string, AgentRunResult> = {};

  const toRun = agentIds.filter((id) => {
    if (!canAgentAct(id)) {
      results[id] = { success: false, error: "Budget exhausted or disabled" };
      return false;
    }
    const check = shouldAgentRun(id);
    if (!check.run) {
      console.log(`[${id}] Skipped: ${check.reason}`);
      results[id] = { success: true, turns: 0, actions: 0 };
      return false;
    }
    console.log(`[${id}] Running: ${check.reason}`);
    return true;
  });

  const settled = await Promise.allSettled(
    toRun.map(async (id) => {
      const result = await runAgentViaCLI(id);
      return { id, result };
    }),
  );

  // Collect results, update registry, log outcomes
  for (const entry of settled) {
    if (entry.status === "fulfilled") {
      const { id, result } = entry.value;
      results[id] = result;
      updateAgentHealth(id, result);
    } else {
      // Promise rejected — agent process crashed hard
      console.error("Agent crashed:", entry.reason);
    }
  }

  return results;
}

The flow is a funnel:

Filter by canAgentAct() — Is the agent enabled? Is it under its daily budget? Is the global budget under the limit? If any check fails, the agent is skipped and the result is recorded immediately.
Filter by shouldAgentRun() — Does the agent have actual work to do? Is it due to run based on its schedule? The reason for skipping gets logged so you can trace exactly why an agent did or did not run.
Run the survivors in parallel — The remaining agents are spawned simultaneously using Promise.allSettled(). This is a critical choice: allSettled, not all. With Promise.all(), if one agent rejects (crashes, times out, throws an unhandled error), the entire batch is cancelled. Every other agent's result is lost. With Promise.allSettled(), each agent runs independently. One agent crashing has zero impact on the others. You get results from the agents that succeeded and error information from the ones that failed.
Collect and record — After all agents finish (or fail), the orchestrator iterates through the settled results. Fulfilled promises contain the agent's run result — success/failure, number of turns used, actions taken. Rejected promises mean the agent process itself crashed, which gets logged as an error. Each agent's health status is updated in the registry.

The entire batch typically takes 2-5 minutes depending on how many agents qualify to run and how much work they have. Since they run in parallel, the total time is roughly equal to the slowest agent, not the sum of all agents.

Putting it all together

Here is the complete orchestration cycle, from start to finish:

┌─────────────────────────────────────────────────────────┐
│                  ORCHESTRATION CYCLE                     │
│                                                         │
│  1. Load registry ──► check daily budgets               │
│  2. For each agent:                                     │
│     canAgentAct? ──► shouldAgentRun? ──► skip or queue  │
│  3. Build context per agent (awareness + memory + ...)  │
│  4. Spawn claude -p subprocesses in parallel             │
│  5. Collect results, update health, log actions          │
│  6. Wait for next cycle                                  │
└─────────────────────────────────────────────────────────┘

Step 1: the orchestrator loads the agent registry from disk. It checks whether the date has rolled over and resets daily counters if needed. It reads the global budget to make sure the system has not hit its ceiling.

Step 2: for each registered agent, two gates. First, canAgentAct() — the hard gate. Is the agent enabled? Is it under budget? Second, shouldAgentRun() — the smart gate. Does the agent have work? Is it due to run? Only agents that pass both gates get queued for execution.

Step 3: for each agent that made it through the gates, the context builder assembles a tailored payload. Awareness of collaborator agents. Persistent memory. Available skills. Recent talkspace messages. Performance metrics. This payload gets injected into the agent's system prompt.

Step 4: the queued agents are spawned as parallel claude -p subprocesses. Each one gets its own process, its own system prompt, its own MCP config. They run simultaneously. The orchestrator waits for all of them using Promise.allSettled().

Step 5: results come back. The orchestrator updates each agent's health status, logs the actions they took, increments their daily usage counters, and records any errors. The updated registry gets written back to disk.

Step 6: the orchestrator sleeps until the next cycle. In the AI University system, cycles run every 30 minutes during business hours and every 2 hours overnight. The cycle interval is configurable — you can run more frequently if your agents have time-sensitive work, or less frequently if you want to conserve resources.

That is the entire system. No agent framework. No complex dependency graph. No message queue. Just a loop that checks who should run, builds their context, spawns them in parallel, and records what happened. The simplicity is the point. When something goes wrong — and in a 15-agent system, something will always eventually go wrong — you want to be able to read the code and understand exactly what happened. Every decision is explicit. Every skip is logged with a reason. Every agent's health is tracked and visible.

The orchestrator does not make the agents smart. The agents are smart because they are Claude with good context. The orchestrator makes them manageable. It turns 15 independent AI processes into a coordinated system that you can monitor, debug, and trust to run while you sleep.