Model Selection Guide: Choosing the Right Model for Each Task
Running every agent task on Opus is like using a forklift to move a coffee cup. Learn the three model tiers, when each earns its place, and how to build a routing layer that cuts costs while keeping quality high.
Model Selection Guide: Choosing the Right Model for Each Task
The biggest cost mistake in agent systems is model uniformity — running every task through the most capable (and most expensive) model regardless of whether the task actually needs it. Summarizing a log file does not require the same reasoning capability as synthesizing a multi-variable business strategy. Using the same model for both is wasteful at best and economically unsustainable at scale.
This guide covers the three Claude model tiers, when each is the right choice, how to build a routing layer that selects the right model automatically, and the real cost math behind the decision.
The Three Model Tiers
Claude models sit on a capability-cost-latency spectrum. Understanding what differentiates each tier is the foundation of good model selection.
Claude Opus: Complex Reasoning
Opus is the highest-capability tier. It excels at tasks that require sustained multi-step reasoning, nuanced judgment, synthesis of conflicting information, and creative problem-solving. It is also the slowest and most expensive tier.
Opus is the right choice when:
- The task requires reasoning across multiple sources to reach a non-obvious conclusion
- Errors are expensive and hard to detect downstream
- The task involves ambiguous instructions that require interpretation
- You need the agent to catch edge cases a simpler model would miss
- The output will be used directly without human review
Opus is the wrong choice when:
- The task is a simple lookup, extraction, or classification
- You are processing high volumes of similar, low-complexity tasks
- Latency matters more than maximum quality
- The output is an intermediate step that will be reviewed before action is taken
Common Opus use cases in a 15-agent system: strategic analysis, multi-source research synthesis, complex code generation, nuanced customer communication drafting, high-stakes decision reasoning.
Claude Sonnet: The Balanced Workhorse
Sonnet sits in the middle of the spectrum. It handles the majority of agent tasks well — it reasons effectively, follows complex instructions, and produces high-quality outputs — at meaningfully lower cost and latency than Opus.
Sonnet is the right choice when:
- The task requires real reasoning but not the highest possible reasoning ceiling
- You are running moderate volumes of tasks where cost is a consideration
- Latency is somewhat important but output quality cannot be sacrificed
- The task is well-defined with clear inputs and expected output structure
Sonnet is where most production agent workloads should default. If you have not explicitly decided a task needs Opus or Haiku, Sonnet is the right starting point.
Common Sonnet use cases: content generation, data transformation with business logic, tool call orchestration, structured data extraction from documents, summarization of complex material.
Claude Haiku: Speed and Volume
Haiku is the fastest and least expensive Claude model. It trades some reasoning depth for dramatically better latency and cost. For tasks that are well-structured, low-ambiguity, and high-volume, Haiku delivers excellent ROI.
Haiku is the right choice when:
- The task is classification, tagging, or simple extraction
- You need sub-second latency
- You are processing thousands of items in a batch
- The task has a narrow, well-defined input-output mapping
- Errors are cheap to catch or correct
Haiku is the wrong choice when:
- The task requires judgment about ambiguous situations
- The output requires nuanced language or complex reasoning
- You cannot afford a high error rate even at low per-error cost
Common Haiku use cases: intent classification, sentiment tagging, entity extraction, simple format conversion, routing decisions, embedding generation alternatives, real-time webhook response.
Pricing Comparison
Costs are per million tokens (input / output). These are approximate reference figures — check Anthropic's pricing page for current rates.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Relative Cost |
|---|---|---|---|
| Claude Opus | ~$15 | ~$75 | 1x (baseline) |
| Claude Sonnet | ~$3 | ~$15 | ~0.2x |
| Claude Haiku | ~$0.25 | ~$1.25 | ~0.017x |
The practical implication: running 1,000 classification tasks on Haiku costs roughly the same as running 17 on Opus. If those 1,000 tasks are truly classification-level work, every dollar spent on Opus is a dollar that provides no quality benefit.
Latency Comparison
Latency matters in agentic systems because it compounds. An 8-step agent loop where each step takes 4 seconds runs in 32 seconds. The same loop with 1-second steps runs in 8 seconds.
Approximate median time-to-first-token under normal load:
| Model | Median TTFT | Practical step latency |
|---|---|---|
| Claude Opus | 1.5 - 4s | 4 - 15s per step |
| Claude Sonnet | 0.5 - 1.5s | 2 - 6s per step |
| Claude Haiku | 0.2 - 0.5s | 0.5 - 2s per step |
For a real-time user-facing agent, Haiku for simple steps and Sonnet for complex steps keeps the experience responsive. Running everything through Opus on a user-facing workflow produces a noticeably sluggish product.
The Claude Max Flat-Rate Advantage
If your team runs on Claude Max (the flat-rate subscription tier), the per-token cost math changes fundamentally. Under Max, you are not billed per token — you pay a fixed monthly rate regardless of usage volume.
This has a specific implication for agent workloads: within the Max subscription, the marginal cost of each additional agent run is effectively zero. This makes Opus significantly more accessible for teams that would otherwise be forced to route heavily to cheaper models.
However, Claude Max applies to the Claude web and API interface with different rate limits depending on the plan tier. For teams running 15 agents with 52 tools, you will hit rate limits before hitting cost walls under Max. The practical constraint becomes throughput, not price.
The architectural response to this: run Opus for the tasks that most benefit from it (the high-value, complex reasoning tasks), run Haiku for the high-frequency low-complexity tasks to preserve your rate limit budget for where it matters, and use Sonnet as the default middle tier.
Even under Max, model selection discipline still matters — not for cost, but for rate limit headroom.
The Model Routing Pattern
Model routing is an architectural pattern where a lightweight classifier evaluates the incoming task and routes it to the appropriate model tier before the main agent loop begins.
How It Works
- A task comes into the orchestrator.
- A routing step (using Haiku, fast and cheap) evaluates the task against a set of complexity criteria.
- The router outputs a model tier:
opus,sonnet, orhaiku. - The orchestrator runs the task on the selected model.
interface RoutingDecision {
model: "claude-opus-4" | "claude-sonnet-4-5" | "claude-haiku-4-5";
reasoning: string;
confidence: "high" | "medium" | "low";
}
async function routeTask(task: AgentTask): Promise<RoutingDecision> {
const routerPrompt = `Analyze this task and select the appropriate Claude model tier.
Task: ${task.description}
Task type: ${task.type}
Expected output: ${task.expectedOutput}
Model tiers:
- opus: Multi-step reasoning, ambiguous judgment, high-stakes decisions, synthesis of conflicting information
- sonnet: Moderate reasoning, structured tasks, content generation, data transformation
- haiku: Classification, extraction, tagging, simple lookup, high-volume low-complexity
Return JSON: { "model": "...", "reasoning": "...", "confidence": "high|medium|low" }`;
const response = await callClaude({
model: "claude-haiku-4-5", // use Haiku for routing — it is fast and this is a simple classification
prompt: routerPrompt,
maxTokens: 150
});
return JSON.parse(response);
}
Complexity Signals
Train your router on these signals to identify task complexity:
High complexity signals (route to Opus):
- Task contains words like "analyze", "synthesize", "recommend", "evaluate trade-offs"
- Task references multiple sources of data that need to be reconciled
- Task output will be used directly in a business decision without review
- Task type is
research,strategy, ordiagnosis
Medium complexity signals (route to Sonnet):
- Task involves generating structured content from a brief
- Task requires understanding of business context
- Task type is
draft,transform,summarize, orplan
Low complexity signals (route to Haiku):
- Task is a classification or yes/no determination
- Task extracts structured data from a known format
- Task type is
tag,classify,extract, orroute - Task is high-volume and repetitive
Routing by Agent Type
In a multi-agent system, you can also route at the agent level rather than the task level. Assign each agent a default model tier based on its function:
const agentModelConfig: Record<string, string> = {
// Opus agents: high-stakes, complex reasoning
"strategy-advisor": "claude-opus-4",
"research-synthesizer": "claude-opus-4",
"risk-evaluator": "claude-opus-4",
// Sonnet agents: balanced workloads
"content-generator": "claude-sonnet-4-5",
"lead-qualifier": "claude-sonnet-4-5",
"campaign-planner": "claude-sonnet-4-5",
"outreach-writer": "claude-sonnet-4-5",
// Haiku agents: high-frequency, low-complexity
"intent-classifier": "claude-haiku-4-5",
"sentiment-tagger": "claude-haiku-4-5",
"webhook-router": "claude-haiku-4-5",
"batch-extractor": "claude-haiku-4-5"
};
This approach is simpler than per-task routing and often sufficient for systems where agent responsibilities are well-defined.
Real Cost Examples
To make the math concrete, here are three real agent workload scenarios.
Scenario 1: Lead qualification pipeline processing 500 leads per day
Each lead qualification involves reading contact data, checking enrichment sources, and scoring — roughly 2,000 tokens in, 500 tokens out.
| Model | Daily cost (500 leads) | Monthly cost |
|---|---|---|
| Opus | ~$81 | ~$2,430 |
| Sonnet | ~$16 | ~$480 |
| Haiku | ~$1.38 | ~$41 |
For a well-defined scoring task, Sonnet likely matches Opus quality at 20% of the cost. Haiku at 2% of the cost is worth testing if the scoring rubric is explicit.
Scenario 2: Strategic account research, 10 reports per week
Each research report involves synthesizing 5 data sources, ~15,000 tokens in, ~3,000 tokens out.
| Model | Weekly cost | Monthly cost |
|---|---|---|
| Opus | ~$24 | ~$96 |
| Sonnet | ~$4.95 | ~$19.80 |
For this workload, Opus is defensible — the absolute dollar difference is small, and the quality gap for complex synthesis is real. But Sonnet is worth A/B testing.
Scenario 3: Real-time chat intent classification, 50,000 events per day
Each classification is simple: read a short message, output one of 10 intent categories. ~200 tokens in, ~30 tokens out.
| Model | Daily cost | Monthly cost |
|---|---|---|
| Opus | ~$166 | ~$4,980 |
| Haiku | ~$2.78 | ~$83 |
This is where model selection discipline becomes essential. Using Opus for classification is economically indefensible when Haiku handles this task with equal accuracy.
Next Steps
Model selection reduces your cost floor. Token optimization reduces how much each model call costs. See Token Optimization for how to trim prompt length, compress context, and cache repeated content across your agent sessions.