Model Selection Guide: Choosing the Right Model for Each Task

The biggest cost mistake in agent systems is model uniformity — running every task through the most capable (and most expensive) model regardless of whether the task actually needs it. Summarizing a log file does not require the same reasoning capability as synthesizing a multi-variable business strategy. Using the same model for both is wasteful at best and economically unsustainable at scale.

This guide covers the three Claude model tiers, when each is the right choice, how to build a routing layer that selects the right model automatically, and the real cost math behind the decision.

The Three Model Tiers

Claude models sit on a capability-cost-latency spectrum. Understanding what differentiates each tier is the foundation of good model selection.

Claude Opus: Complex Reasoning

Opus is the highest-capability tier. It excels at tasks that require sustained multi-step reasoning, nuanced judgment, synthesis of conflicting information, and creative problem-solving. It is also the slowest and most expensive tier.

Opus is the right choice when:

The task requires reasoning across multiple sources to reach a non-obvious conclusion
Errors are expensive and hard to detect downstream
The task involves ambiguous instructions that require interpretation
You need the agent to catch edge cases a simpler model would miss
The output will be used directly without human review

Opus is the wrong choice when:

The task is a simple lookup, extraction, or classification
You are processing high volumes of similar, low-complexity tasks
Latency matters more than maximum quality
The output is an intermediate step that will be reviewed before action is taken

Common Opus use cases in a 15-agent system: strategic analysis, multi-source research synthesis, complex code generation, nuanced customer communication drafting, high-stakes decision reasoning.

Claude Sonnet: The Balanced Workhorse

Sonnet sits in the middle of the spectrum. It handles the majority of agent tasks well — it reasons effectively, follows complex instructions, and produces high-quality outputs — at meaningfully lower cost and latency than Opus.

Sonnet is the right choice when:

The task requires real reasoning but not the highest possible reasoning ceiling
You are running moderate volumes of tasks where cost is a consideration
Latency is somewhat important but output quality cannot be sacrificed
The task is well-defined with clear inputs and expected output structure

Sonnet is where most production agent workloads should default. If you have not explicitly decided a task needs Opus or Haiku, Sonnet is the right starting point.

Common Sonnet use cases: content generation, data transformation with business logic, tool call orchestration, structured data extraction from documents, summarization of complex material.

Claude Haiku: Speed and Volume

Haiku is the fastest and least expensive Claude model. It trades some reasoning depth for dramatically better latency and cost. For tasks that are well-structured, low-ambiguity, and high-volume, Haiku delivers excellent ROI.

Haiku is the right choice when:

The task is classification, tagging, or simple extraction
You need sub-second latency
You are processing thousands of items in a batch
The task has a narrow, well-defined input-output mapping
Errors are cheap to catch or correct

Haiku is the wrong choice when:

The task requires judgment about ambiguous situations
The output requires nuanced language or complex reasoning
You cannot afford a high error rate even at low per-error cost

Common Haiku use cases: intent classification, sentiment tagging, entity extraction, simple format conversion, routing decisions, embedding generation alternatives, real-time webhook response.

Pricing Comparison

Costs are per million tokens (input / output). These are approximate reference figures — check Anthropic's pricing page for current rates.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Relative Cost
Claude Opus	~$15	~$75	1x (baseline)
Claude Sonnet	~$3	~$15	~0.2x
Claude Haiku	~$0.25	~$1.25	~0.017x

The practical implication: running 1,000 classification tasks on Haiku costs roughly the same as running 17 on Opus. If those 1,000 tasks are truly classification-level work, every dollar spent on Opus is a dollar that provides no quality benefit.

Latency Comparison

Latency matters in agentic systems because it compounds. An 8-step agent loop where each step takes 4 seconds runs in 32 seconds. The same loop with 1-second steps runs in 8 seconds.

Approximate median time-to-first-token under normal load:

Model	Median TTFT	Practical step latency
Claude Opus	1.5 - 4s	4 - 15s per step
Claude Sonnet	0.5 - 1.5s	2 - 6s per step
Claude Haiku	0.2 - 0.5s	0.5 - 2s per step

For a real-time user-facing agent, Haiku for simple steps and Sonnet for complex steps keeps the experience responsive. Running everything through Opus on a user-facing workflow produces a noticeably sluggish product.

The Claude Max Flat-Rate Advantage

If your team runs on Claude Max (the flat-rate subscription tier), the per-token cost math changes fundamentally. Under Max, you are not billed per token — you pay a fixed monthly rate regardless of usage volume.

This has a specific implication for agent workloads: within the Max subscription, the marginal cost of each additional agent run is effectively zero. This makes Opus significantly more accessible for teams that would otherwise be forced to route heavily to cheaper models.

However, Claude Max applies to the Claude web and API interface with different rate limits depending on the plan tier. For teams running 15 agents with 52 tools, you will hit rate limits before hitting cost walls under Max. The practical constraint becomes throughput, not price.

The architectural response to this: run Opus for the tasks that most benefit from it (the high-value, complex reasoning tasks), run Haiku for the high-frequency low-complexity tasks to preserve your rate limit budget for where it matters, and use Sonnet as the default middle tier.

Even under Max, model selection discipline still matters — not for cost, but for rate limit headroom.

The Model Routing Pattern

Model routing is an architectural pattern where a lightweight classifier evaluates the incoming task and routes it to the appropriate model tier before the main agent loop begins.

How It Works

A task comes into the orchestrator.
A routing step (using Haiku, fast and cheap) evaluates the task against a set of complexity criteria.
The router outputs a model tier: opus, sonnet, or haiku.
The orchestrator runs the task on the selected model.

interface RoutingDecision {
  model: "claude-opus-4" | "claude-sonnet-4-5" | "claude-haiku-4-5";
  reasoning: string;
  confidence: "high" | "medium" | "low";
}

async function routeTask(task: AgentTask): Promise<RoutingDecision> {
  const routerPrompt = `Analyze this task and select the appropriate Claude model tier.

Task: ${task.description}
Task type: ${task.type}
Expected output: ${task.expectedOutput}

Model tiers:
- opus: Multi-step reasoning, ambiguous judgment, high-stakes decisions, synthesis of conflicting information
- sonnet: Moderate reasoning, structured tasks, content generation, data transformation
- haiku: Classification, extraction, tagging, simple lookup, high-volume low-complexity

Return JSON: { "model": "...", "reasoning": "...", "confidence": "high|medium|low" }`;

  const response = await callClaude({
    model: "claude-haiku-4-5",  // use Haiku for routing — it is fast and this is a simple classification
    prompt: routerPrompt,
    maxTokens: 150
  });

  return JSON.parse(response);
}

Complexity Signals

Train your router on these signals to identify task complexity:

High complexity signals (route to Opus):

Task contains words like "analyze", "synthesize", "recommend", "evaluate trade-offs"
Task references multiple sources of data that need to be reconciled
Task output will be used directly in a business decision without review
Task type is research, strategy, or diagnosis

Medium complexity signals (route to Sonnet):

Task involves generating structured content from a brief
Task requires understanding of business context
Task type is draft, transform, summarize, or plan

Low complexity signals (route to Haiku):

Task is a classification or yes/no determination
Task extracts structured data from a known format
Task type is tag, classify, extract, or route
Task is high-volume and repetitive

Routing by Agent Type

In a multi-agent system, you can also route at the agent level rather than the task level. Assign each agent a default model tier based on its function:

const agentModelConfig: Record<string, string> = {
  // Opus agents: high-stakes, complex reasoning
  "strategy-advisor": "claude-opus-4",
  "research-synthesizer": "claude-opus-4",
  "risk-evaluator": "claude-opus-4",

  // Sonnet agents: balanced workloads
  "content-generator": "claude-sonnet-4-5",
  "lead-qualifier": "claude-sonnet-4-5",
  "campaign-planner": "claude-sonnet-4-5",
  "outreach-writer": "claude-sonnet-4-5",

  // Haiku agents: high-frequency, low-complexity
  "intent-classifier": "claude-haiku-4-5",
  "sentiment-tagger": "claude-haiku-4-5",
  "webhook-router": "claude-haiku-4-5",
  "batch-extractor": "claude-haiku-4-5"
};

This approach is simpler than per-task routing and often sufficient for systems where agent responsibilities are well-defined.

Real Cost Examples

To make the math concrete, here are three real agent workload scenarios.

Scenario 1: Lead qualification pipeline processing 500 leads per day

Each lead qualification involves reading contact data, checking enrichment sources, and scoring — roughly 2,000 tokens in, 500 tokens out.

Model	Daily cost (500 leads)	Monthly cost
Opus	~$81	~$2,430
Sonnet	~$16	~$480
Haiku	~$1.38	~$41

For a well-defined scoring task, Sonnet likely matches Opus quality at 20% of the cost. Haiku at 2% of the cost is worth testing if the scoring rubric is explicit.

Scenario 2: Strategic account research, 10 reports per week

Each research report involves synthesizing 5 data sources, ~15,000 tokens in, ~3,000 tokens out.

Model	Weekly cost	Monthly cost
Opus	~$24	~$96
Sonnet	~$4.95	~$19.80

For this workload, Opus is defensible — the absolute dollar difference is small, and the quality gap for complex synthesis is real. But Sonnet is worth A/B testing.

Scenario 3: Real-time chat intent classification, 50,000 events per day

Each classification is simple: read a short message, output one of 10 intent categories. ~200 tokens in, ~30 tokens out.

Model	Daily cost	Monthly cost
Opus	~$166	~$4,980
Haiku	~$2.78	~$83

This is where model selection discipline becomes essential. Using Opus for classification is economically indefensible when Haiku handles this task with equal accuracy.

Next Steps

Model selection reduces your cost floor. Token optimization reduces how much each model call costs. See Token Optimization for how to trim prompt length, compress context, and cache repeated content across your agent sessions.