Looking for AI consulting services?Talk to the Padiso team
All posts
Guide

Model Selection for Agent Teams: Opus, Sonnet, Haiku, and the Economics of Each

Engineer your agent team economics. Choose Opus, Sonnet, or Haiku based on reasoning depth, latency, and cost per task. Framework for production deployments.

TPThe Padiso Team
16 minutes read

Why Model Selection Matters for Agent Teams

Running an agent team in production isn't about picking the smartest model-it's about picking the right model for each role. When you're operating a headless company or scaling autonomous workflows, every token costs money, every millisecond adds latency, and every hallucination can cascade through your system.

Most teams treat model selection as a binary choice: use the best model everywhere, or use the cheapest model everywhere. Both approaches leave money on the table and degrade your system's reliability.

Instead, think of your agent team like a company org chart. You don't hire a PhD researcher to answer phones, and you don't hire an intern to design your strategy. The same principle applies to Claude models-Opus, Sonnet, and Haiku each have a specific job to do, and when you assign them correctly, your cost-per-task drops, your latency improves, and your accuracy increases.

This guide walks you through the economics and mechanics of model selection for agent teams. We'll cover what each model is built for, how to measure the tradeoffs, and how to structure your agent orchestration to minimize cost while maximizing reliability. If you're deploying agents on Padiso's agent orchestration platform, you'll be able to implement these patterns immediately.

Understanding the Claude Model Spectrum

Anthropologic offers three main Claude models in active production use: Opus, Sonnet, and Haiku. Each sits at a different point on the intelligence-speed-cost curve.

Opus is the heavyweight. It has the deepest reasoning capability, the largest context window (200K tokens), and the highest accuracy on complex tasks. It's slower than its siblings and costs more per token. Opus is your specialist-the model you deploy when the task is genuinely complex and errors are expensive.

Sonnet is the balanced option. It trades some of Opus's reasoning depth for better speed and lower cost. Sonnet has a 200K context window and strong performance on most production tasks. It's the workhorse of most agent teams-capable enough for the majority of work, fast enough to keep latency reasonable, and cheap enough to run at scale.

Haiku is the sprinter. It's the fastest and cheapest of the three, with a 200K context window. Haiku sacrifices some reasoning capability but handles straightforward, well-defined tasks efficiently. Think of Haiku as the agent you deploy for high-volume, low-complexity work where speed and cost matter more than reasoning depth.

According to official Anthropic documentation on choosing a model, the decision framework hinges on three variables: task complexity, required latency, and acceptable cost per task. Understanding these variables is the foundation of smart model routing in agent teams.

The Economics: Cost Per Task Across Models

Let's get concrete about pricing. As of early 2025, the Claude model pricing looks roughly like this (prices vary by region and volume, so check Padiso's transparent pricing page for exact rates):

Opus: ~$15 per million input tokens, ~$75 per million output tokens Sonnet: ~$3 per million input tokens, ~$15 per million output tokens Haiku: ~$0.80 per million input tokens, ~$4 per million output tokens

Those numbers don't tell the full story. What matters is cost per task, not cost per token.

Consider a research agent that needs to read a 10K document (roughly 8,000 tokens), analyze it, and produce a 500-token summary.

  • Using Opus: 8,500 input tokens × ($15/1M) + 500 output tokens × ($75/1M) = $0.128 + $0.038 = $0.166 per task
  • Using Sonnet: 8,500 input tokens × ($3/1M) + 500 output tokens × ($15/1M) = $0.026 + $0.008 = $0.034 per task
  • Using Haiku: 8,500 input tokens × ($0.80/1M) + 500 output tokens × ($4/1M) = $0.007 + $0.002 = $0.009 per task

Haiku is 18x cheaper than Opus on this task. But here's the catch: if Haiku gets the analysis wrong 10% of the time and you need to re-run it, your actual cost per successful task becomes $0.01, and your latency doubles. If Opus gets it right 99% of the time, your true cost per successful task is $0.167, but you avoid costly downstream errors.

This is why model selection is an engineering problem, not just a budget problem. You're optimizing for cost per successful task, not cost per token.

Reasoning Depth: When You Need Opus

Opus shines on tasks that require multi-step reasoning, synthesis across complex information, or judgment calls where errors compound.

Research and synthesis agents benefit from Opus. If an agent is reading multiple documents, cross-referencing claims, and synthesizing a recommendation, Opus's deeper reasoning reduces hallucinations and improves accuracy. A venture capital agent evaluating a startup pitch needs to weigh contradictory signals-Opus handles that better than Sonnet.

Strategic planning agents should use Opus. If an agent is designing a workflow, breaking down a complex problem, or making architectural decisions, the reasoning depth matters. A founder automation agent that's planning a go-to-market strategy benefits from Opus's ability to hold multiple constraints in mind simultaneously.

Complex code review and generation is another Opus use case. According to benchmarks ranking Claude models for coding, Opus outperforms Sonnet on multi-file refactoring, architectural decisions, and security-sensitive code. If your agent is reviewing critical infrastructure code, Opus reduces the risk of subtle bugs.

Financial analysis and risk assessment often require Opus. If an agent is calculating complex valuations, stress-testing scenarios, or identifying edge cases in contracts, Opus's reasoning depth translates directly to fewer errors and better decisions.

The pattern: use Opus when the cost of error is high, the task requires reasoning across multiple constraints, or the output directly affects business decisions.

Speed and Latency: When Sonnet Is the Right Choice

Sonnet is the sweet spot for most agent teams. It's fast enough to keep user-facing workflows responsive, accurate enough for most production tasks, and cheap enough to run at scale.

Customer-facing agents typically use Sonnet. If a user is waiting for a response-whether it's a chatbot, a research assistant, or a data lookup agent-latency matters. Sonnet's speed keeps response times under 2-3 seconds for most tasks, while Opus might take 5-10 seconds. In user-facing scenarios, that difference is the difference between a usable product and one that feels slow.

High-volume processing workflows should default to Sonnet. If you're processing thousands of customer support tickets, analyzing hundreds of job applications, or categorizing a large dataset, Sonnet's combination of speed and accuracy is ideal. You get good results without the latency and cost overhead of Opus.

Content generation and transformation is Sonnet's wheelhouse. Summarizing articles, rewriting copy, translating text, or transforming data formats-these tasks don't require Opus's reasoning depth, but they do require decent speed and accuracy. Sonnet handles these at scale.

Routing and classification agents often use Sonnet. If an agent is deciding which department should handle a request, categorizing a support ticket, or routing a task to another agent, Sonnet provides good accuracy without unnecessary latency.

Structured data extraction from documents is another strong Sonnet use case. If your agent is pulling key information from contracts, invoices, or forms, Sonnet's speed and accuracy are well-matched to the task.

According to enterprise guides comparing Claude models, Sonnet represents the optimal balance for most enterprise agent deployments-it's fast enough to maintain responsiveness, accurate enough to reduce error handling overhead, and cheap enough to scale.

Cost Optimization: When Haiku Makes Sense

Haiku is the high-volume, low-complexity workhorse. It's 20x cheaper than Opus and significantly faster. The tradeoff is reasoning depth-Haiku struggles with multi-step reasoning, nuanced judgment, and complex synthesis.

High-volume classification and tagging is Haiku's core strength. If your agent is tagging customer feedback, categorizing support tickets, or labeling data, Haiku provides good accuracy at minimal cost. The task is well-defined, the categories are clear, and the cost savings are enormous.

Simple lookup and retrieval tasks work well with Haiku. If an agent is fetching data from a database, searching a knowledge base, or retrieving information from an API, Haiku is fast and cheap. These tasks don't require reasoning-they require accuracy on a narrow task.

Format conversion and normalization is another Haiku use case. Converting data between formats, normalizing addresses, or standardizing text-these are mechanical tasks that don't require deep reasoning. Haiku handles them efficiently.

Spam detection and content filtering often use Haiku. Identifying obviously inappropriate content, filtering spam, or flagging policy violations-these are pattern-matching tasks that don't require Opus's reasoning.

Repetitive, well-defined workflows belong on Haiku. If you have a workflow that's been run a thousand times, the patterns are clear, and the edge cases are rare, Haiku is the right tool. You save cost without sacrificing reliability.

The key constraint: Haiku is for tasks where the problem is well-defined, the solution is straightforward, and the margin for error is low. If the task requires judgment, synthesis, or multi-step reasoning, Haiku will disappoint you.

The Advisor Strategy: Mixing Models for Efficiency

One of the most powerful patterns for agent teams is the advisor strategy-using Opus as a planner and Haiku or Sonnet as executors. This approach can cut your model costs in half while maintaining accuracy.

Here's how it works:

  1. Opus plans the approach: When a complex task arrives, route it to Opus. Opus reads the input, thinks through the problem, and produces a detailed plan or framework. This might be a step-by-step approach, a template, or a set of specific instructions.

  2. Haiku or Sonnet executes: Once Opus has produced the plan, route the actual execution to Haiku or Sonnet. They follow the plan, execute the specific steps, and produce the output.

  3. Cost savings: You pay Opus's premium once per task type, then pay Haiku's bargain rates for every execution. According to explanations of the Anthropic advisor strategy, this pattern can reduce overall costs by 40-60% while maintaining or improving accuracy.

Example: Content generation at scale

You have 1,000 customer case studies to write. Instead of running Opus 1,000 times:

  1. Run Opus once to produce a detailed template: "Here's the structure for a case study: intro (100 words), problem statement (150 words), solution (200 words), results (100 words), conclusion (50 words). Here are the key themes to emphasize."

  2. Run Haiku 1,000 times, each one following the template with specific customer data.

Cost breakdown:

  • Opus run: $0.50
  • 1,000 Haiku runs: $0.009 × 1,000 = $9
  • Total: $9.50 for 1,000 case studies, or $0.0095 per case study

If you'd run Opus 1,000 times directly, you'd spend roughly $166. The advisor strategy cuts your cost to $9.50 while potentially improving consistency and quality.

This pattern works for:

  • Content generation: Opus plans the structure, Haiku writes each piece
  • Data analysis: Opus designs the analysis framework, Sonnet executes on each dataset
  • Customer support: Opus writes the response template, Haiku customizes for each ticket
  • Code generation: Opus designs the architecture, Sonnet generates each component

The advisor strategy is particularly powerful in Padiso's agent orchestration platform, where you can route tasks between models based on task type, complexity, or learned patterns.

Building a Routing Framework for Your Agent Team

Once you understand the economics and capabilities of each model, the next step is building a routing framework-a decision system that automatically assigns the right model to each task.

Task complexity assessment: The first layer of routing is task complexity. Does the task require multi-step reasoning? Does it involve synthesizing information across multiple sources? Does it require judgment or nuance? If yes to any of these, route to Opus or Sonnet. If no, route to Haiku.

Latency requirements: The second layer is latency. Does the user or downstream process need a response quickly? If you have a user waiting for a response, prioritize Sonnet or Haiku. If the task is asynchronous and latency doesn't matter, Opus is acceptable.

Error cost: The third layer is error cost. What happens if the agent gets this wrong? If an error cascades into other systems, wastes human time, or damages customer trust, route to Opus or Sonnet. If an error is easily caught and corrected, Haiku is acceptable.

Volume: The fourth layer is volume. How many times will this task run? If it's a one-off task and reasoning is important, use Opus. If it's a high-volume, repetitive task, use Haiku or Sonnet.

Structured routing logic:

IF task_complexity == "high" AND error_cost == "high"
  THEN route to Opus
ELSE IF latency_requirement < 3_seconds OR volume > 1000_per_day
  THEN route to Sonnet
ELSE IF task_complexity == "low" AND volume > 100_per_day
  THEN route to Haiku
ELSE
  THEN route to Sonnet (default)

This logic isn't prescriptive-it's a starting point. Your specific routing framework should reflect your business constraints, error tolerance, and cost targets.

Padiso's documentation includes examples of implementing model routing in production agent teams, including how to monitor which models are handling which tasks and how to adjust routing based on observed performance.

Measuring Performance: Metrics That Matter

Once you've deployed your agent team with a routing framework, you need to measure whether your model selection is actually working.

Cost per successful task: This is your primary metric. Track the total cost (input tokens + output tokens) divided by the number of successful task completions. If you're seeing high costs, you might be over-routing to Opus. If you're seeing high error rates, you might be under-routing to Opus.

Accuracy by model: Track the accuracy of each model on each task type. Does Haiku get this task right 95% of the time, or 70%? Does Opus improve accuracy enough to justify the cost? Measure this empirically-don't guess.

Latency by model: Measure end-to-end latency for user-facing tasks. How much faster is Sonnet than Opus? Is the latency difference meaningful to your users? For asynchronous tasks, latency might not matter, but for user-facing workflows, it does.

Error cascade impact: Track not just whether an agent made an error, but what happened as a result. Did it require human intervention? Did it cause downstream failures? Did it damage customer trust? A 1% error rate on a low-impact task is acceptable; a 1% error rate on a high-impact task is not.

Cost savings from routing: Compare your actual cost per task to a baseline (e.g., "what if we used Sonnet for everything?"). Are you actually saving money with your routing framework, or is it just adding complexity?

Real-World Example: Building a Headless Company's Agent Team

Let's walk through how a founder might structure their agent team for a headless company-a company run primarily by AI agents with minimal human overhead.

Research agent (Opus): This agent reads market reports, competitor analyses, and industry news. It synthesizes information and produces strategic insights. Reasoning depth matters; error cost is high (bad research leads to bad strategy). Route to Opus.

Customer support agent (Sonnet): This agent handles customer inquiries, troubleshoots problems, and escalates when necessary. Latency matters (customers are waiting), but reasoning isn't deeply complex. Route to Sonnet.

Data processing agent (Haiku): This agent pulls data from APIs, transforms it, and loads it into databases. The tasks are mechanical and well-defined. Route to Haiku.

Content generation agent (mixed): This agent generates marketing copy, blog posts, and product descriptions. Use the advisor strategy: Opus writes the content framework once, then Haiku generates variations for each product.

Financial analysis agent (Opus): This agent models cash flow, calculates unit economics, and identifies financial risks. Reasoning depth and accuracy are critical. Route to Opus.

Code review agent (Opus): This agent reviews pull requests, checks for security issues, and suggests improvements. Errors here can introduce bugs into production. Route to Opus.

Monthly cost estimate for a headless company running 10,000 tasks per month:

  • Research agent: 100 tasks × $0.166 = $16.60
  • Customer support: 2,000 tasks × $0.034 = $68
  • Data processing: 5,000 tasks × $0.009 = $45
  • Content generation: 2,000 tasks × $0.005 (advisor strategy) = $10
  • Financial analysis: 500 tasks × $0.166 = $83
  • Code review: 300 tasks × $0.166 = $49.80

Total: ~$272 per month for 10,000 agent tasks

Compare this to running everything on Opus (10,000 × $0.166 = $1,660) or everything on Haiku (10,000 × $0.009 = $90, but with much higher error rates and likely more human intervention).

Smart routing gets you the best of both worlds: low cost and high reliability.

Integration with Agent Orchestration Platforms

Implementing model routing manually is possible but fragile. As your agent team grows, you need a platform that handles routing, monitoring, and cost tracking automatically.

Padiso's agent orchestration platform is built for exactly this-deploying, running, and scaling agent teams with zero infrastructure overhead. The platform lets you define routing rules, monitor which models are handling which tasks, and adjust routing based on real-world performance.

Key features for model selection:

  • Model routing rules: Define which model handles which task type based on complexity, latency, or other criteria
  • Cost tracking: See exactly how much each model is costing you, broken down by task type and agent
  • Performance monitoring: Track accuracy, latency, and error rates for each model on each task
  • Advisor strategy support: Route planning and execution to different models automatically
  • MCP server integration: Connect to external tools and data sources without changing your routing logic

Padiso's integrations page shows the breadth of tools and services you can connect to your agent team, allowing you to build complex workflows without custom infrastructure.

Practical Decision Framework: Your Model Selection Checklist

When you're deciding which model to assign to a new agent or task, use this checklist:

1. Define the task clearly

  • What is the agent trying to accomplish?
  • What inputs will it receive?
  • What outputs does it need to produce?

2. Assess complexity

  • Does this require multi-step reasoning? (Yes = Opus/Sonnet)
  • Does this require judgment or nuance? (Yes = Opus)
  • Does this require synthesizing information across sources? (Yes = Opus/Sonnet)
  • Is this a mechanical, well-defined task? (Yes = Haiku)

3. Measure latency requirements

  • Is a user waiting for a response? (Yes = Sonnet/Haiku)
  • Is this asynchronous background work? (Yes = Opus acceptable)
  • What's the acceptable response time? (< 2 seconds = Sonnet/Haiku)

4. Calculate error cost

  • What happens if the agent gets this wrong?
  • Does it cascade into other systems?
  • Does it require human intervention?
  • Does it damage customer trust?
  • High cost = Opus/Sonnet; Low cost = Haiku acceptable

5. Estimate volume

  • How many times will this task run per month?
  • Is this a one-off task or recurring?
  • High volume (>1000/month) = Haiku/Sonnet; Low volume = Opus acceptable

6. Consider the advisor strategy

  • Is there a planning phase that could be separated from execution?
  • Could Opus plan once and Haiku execute many times?
  • If yes, this could cut costs by 50%+

7. Start with Sonnet

  • If you're unsure, start with Sonnet
  • Sonnet is the balanced default-fast, accurate, and reasonably priced
  • Optimize from there based on actual performance data

Common Mistakes in Model Selection

Mistake 1: Using Opus for everything because "it's the best"

Opus isn't the best-it's the most capable at reasoning. For high-volume, low-complexity tasks, Opus is overkill. You're paying 20x more than necessary for tasks that don't benefit from deep reasoning.

Mistake 2: Using Haiku to save money on tasks where accuracy matters

Haiku is cheap, but if it gets the task wrong 10% of the time and you need to re-run it, your effective cost per successful task goes up. Worse, if errors cascade into other systems, the cost of an error might be far higher than the token savings.

Mistake 3: Not measuring actual performance

You can't optimize what you don't measure. Track accuracy, latency, and cost for each model on each task type. This data will reveal where your routing is working and where it needs adjustment.

Mistake 4: Ignoring the advisor strategy

The advisor strategy-using Opus for planning and Haiku/Sonnet for execution-is one of the most cost-effective patterns for agent teams. If you're running high-volume tasks, you should at least test this approach.

Mistake 5: Routing based on task name rather than task properties

Don't route "customer support" to Sonnet and "research" to Opus just because that's the pattern you've seen. Route based on actual complexity, latency, and error cost. A simple research task might be fine on Haiku; a complex customer support issue might need Opus.

Looking Forward: Model Selection in a Multi-Model World

The Claude model lineup will evolve. New models will arrive, existing models will improve, and pricing will change. The framework in this guide-assessing complexity, latency, error cost, and volume-will remain relevant regardless of which specific models are available.

The principle is timeless: assign the right tool to the right job. Don't over-engineer simple tasks, and don't under-invest in complex ones. Measure actual performance, and adjust based on data.

As you scale your agent team, model selection becomes increasingly important. A 10% improvement in cost-per-task across 10,000 monthly tasks saves $27 per month-not much. But across 100,000 monthly tasks, it saves $270 per month, or $3,240 per year. Across a million tasks, it's $32,400 per year. The economics of model selection compound as you scale.

Padiso's blog regularly publishes updates on model performance, new capabilities, and optimization strategies. If you're serious about running agent teams in production, staying informed about model changes is essential.

Getting Started with Model Selection

If you're ready to deploy agent teams with smart model selection, here's the next step:

  1. Define your agent team's tasks: List out what each agent does, how often it runs, and how critical accuracy is

  2. Map tasks to models: Use the framework above to assign an initial model to each task

  3. Deploy on an agent orchestration platform: Padiso lets you implement model routing without custom infrastructure

  4. Measure performance: Track cost, accuracy, and latency for each model on each task

  5. Optimize based on data: Adjust routing rules based on what you learn

  6. Test the advisor strategy: If you have high-volume tasks, experiment with using Opus for planning and Haiku for execution

Model selection isn't a one-time decision-it's an ongoing optimization process. As your agent team grows, as task volumes change, and as model capabilities evolve, your routing framework should evolve with it.

The goal isn't to pick the perfect model for each task. The goal is to pick a good model, measure how it performs, and improve from there. Start simple, measure everything, and optimize based on real-world data.

According to the Claude model selection guide from SitePoint, the frameworks that work best are those that balance impact to cost, latency, and quality-exactly what we've covered here.

Your agent team's economics depend on getting model selection right. With the framework in this guide and the tools available on platforms like Padiso, you have everything you need to build efficient, scalable, cost-effective agent teams that actually work in production.

Start with the basics: understand what each model does, match models to tasks based on complexity and cost, measure performance, and optimize. That's the foundation of smart agent team economics.