Customer Support Agents

Customer support is where most organizations deploy their first AI agent in production. Not because it is the most technically interesting problem, but because the economics are hard to argue with: tickets come in at volume, the cost of handling each one manually is measurable, and a well-built agent can resolve the majority without human involvement.

The data is consistent across organizations. Around 49% have already deployed AI agents in support contexts. Teams doing it well report 60-70% reductions in support cost per resolution. Average response times drop from hours to seconds. Customer satisfaction scores go up when the agent is built correctly — because customers prefer fast, accurate answers over slow human ones.

The agent only loses when it hallucinates a policy, refuses to escalate when it should, or responds in a way that feels dismissive. Those are engineering problems with known solutions.

Why Customer Support Is the Leading Use Case

Three properties make this problem domain well-suited to AI agents.

First, the inputs are language and the outputs are language. The agent's core competency is exactly what the job requires.

Second, the task space is bounded. Most support teams can enumerate their top 20 ticket categories and cover 80-90% of volume. Billing questions, password resets, feature confusion, integration errors, refund requests — these are not infinitely varied. An agent trained on your knowledge base and past tickets will encounter genuinely novel situations rarely enough that escalation handles the exceptions.

Third, the stakes of individual errors are recoverable. A hallucinated policy in a support reply is embarrassing and sometimes costly, but not catastrophic in the way errors in medical or financial systems are. This makes customer support a viable first deployment for teams still building confidence in their agent infrastructure.

The Support Agent Architecture

A production support system is not a single agent. It is a handoff pipeline of specialized agents, each with a narrow job, passing context forward to the next stage.

Incoming Ticket
      |
      v
[Triage Agent]
  - Classify urgency (P1/P2/P3)
  - Classify category (billing, technical, legal, general)
  - Score sentiment
  - Set escalation flags
      |
      v
[Response Agent]
  - Search knowledge base (RAG)
  - Search past resolved tickets
  - Draft reply with confidence score
      |
      v
[Human Review Gate]
  - Auto-send if confidence > threshold AND no escalation flags
  - Route to human if below threshold OR flagged
      |
      v
[Send Reply]
      |
      v
[Follow-up Agent]
  - Check resolution after 24 hours
  - Re-trigger pipeline if unresolved

The triage agent's classification informs the response agent's knowledge base queries. The response agent's confidence score informs the human review gate's routing decision. Context accumulates through the chain.

The Triage Agent

The triage agent reads the incoming ticket and produces a structured classification that every subsequent agent depends on. It answers four questions: urgency (P1/P2/P3), category (billing, technical, account, product, legal, general), sentiment (angry/frustrated/neutral/positive), and whether any escalation flags apply.

Here is the TypeScript implementation:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

interface TriageResult {
  ticketId: string;
  urgency: "P1" | "P2" | "P3";
  category: "billing" | "technical" | "account" | "product" | "legal" | "general";
  sentiment: "angry" | "frustrated" | "neutral" | "positive";
  sentimentScore: number; // -1.0 to 1.0
  escalationFlags: EscalationFlag[];
  suggestedQueue: string;
}

type EscalationFlag =
  | "repeated-contact"
  | "refund-request"
  | "legal-threat"
  | "churn-risk"
  | "data-breach-mention"
  | "high-value-account";

async function runTriageAgent(
  ticket: IncomingTicket,
  customerHistory: CustomerHistory
): Promise<TriageResult> {
  const response = await client.messages.create({
    model: "claude-opus-4-6",
    max_tokens: 512,
    system: `You are a customer support triage agent. Classify the ticket and return valid JSON.

Urgency rules:
- P1: production outage, data loss, security breach
- P2: significant feature broken, billing error over $500
- P3: questions, minor issues, feature requests

Set escalation flags when present:
- repeated-contact: 3+ prior contacts in 30 days
- refund-request: any mention of refund or chargeback
- legal-threat: mention of lawyer, lawsuit, or regulatory action
- churn-risk: customer mentions canceling or switching
- data-breach-mention: mention of exposed data or privacy violation
- high-value-account: enterprise tier or LTV over $10,000

Return only JSON matching TriageResult schema.`,
    messages: [
      {
        role: "user",
        content: `Ticket subject: ${ticket.subject}
Ticket body: ${ticket.body}
Customer tier: ${customerHistory.tier}
Prior open tickets: ${customerHistory.openTickets}
Prior contacts (90 days): ${customerHistory.resolvedTickets90Days}`,
      },
    ],
  });

  const classification = JSON.parse(extractText(response));
  return { ticketId: ticket.id, ...classification };
}

The prompt gives explicit rules for urgency levels and escalation flags. Ambiguity in triage compounds through the rest of the pipeline.

The Response Agent and Knowledge Base Integration

The response agent drafts a reply using two data sources: your knowledge base and past resolved tickets. Both come in through tool calls, not the system prompt.

RAG (retrieval-augmented generation) is the standard pattern here. Instead of loading your entire knowledge base into context, you retrieve the relevant sections at query time based on the ticket content. The agent generates a search query, calls search_knowledge_base, gets back the top-N relevant documentation chunks, and uses those as grounding for its draft.

This keeps the context window focused and, more importantly, prevents policy hallucinations. The agent's response is grounded in your actual documentation rather than its training data. Every policy statement in the draft must come from a retrieved source.

The tools the response agent calls:

const searchKnowledgeBaseTool = {
  name: "search_knowledge_base",
  description: "Search documentation for policies, guides, and product information relevant to the ticket.",
  input_schema: {
    type: "object",
    properties: {
      query: { type: "string" },
      category: { type: "string" },
      limit: { type: "number" },
    },
    required: ["query"],
  },
};

const queryTicketsTool = {
  name: "query_tickets",
  description: "Search previously resolved tickets for similar cases and resolutions.",
  input_schema: {
    type: "object",
    properties: {
      query: { type: "string" },
      category: { type: "string" },
      resolvedOnly: { type: "boolean" },
    },
    required: ["query"],
  },
};

The response agent outputs a draft alongside a confidence score:

interface DraftResponse {
  draft: string;
  confidence: number; // 0.0 to 1.0
  groundingSources: string[];
  requiresHumanReview: boolean;
  reviewReason?: string;
}

A confidence score below 0.75 routes to human review. The human reviews and edits the draft rather than writing from scratch — which keeps humans in the loop without making the agent useless.

Escalation Logic

Escalation is not a failure mode. It is a designed feature. The rules should be explicit code, not a judgment call the agent makes.

Always escalate to a human for:

Refund requests above your dollar threshold (set this based on your churn economics — a common starting point is $200)
Legal threats — any mention of lawyers, lawsuits, or regulatory action. The agent acknowledges receipt only; a human with authority responds.
Data breach or security incident mentions. The response window is minutes, not hours.
Angry customers with three or more prior unresolved contacts. The cost of losing the account exceeds the cost of a senior agent handling it personally.
Enterprise or high-value accounts on P1 issues. They have SLA contracts.

Escalate based on confidence for everything else. Below your threshold, the agent still produces a draft and routes to human review. The escalate_to_human tool handles both paths:

const escalateToHumanTool = {
  name: "escalate_to_human",
  description: "Route the ticket to a human agent.",
  input_schema: {
    type: "object",
    properties: {
      ticketId: { type: "string" },
      reason: { type: "string" },
      priority: { type: "string", enum: ["standard", "urgent", "critical"] },
      suggestedQueue: { type: "string" },
      draftForReview: { type: "string" },
    },
    required: ["ticketId", "reason", "priority"],
  },
};

The draftForReview parameter is important. Even on escalated tickets, the agent produces a draft. Humans edit faster than they write.

Full Tool Set

Tool	Purpose
`query_tickets`	Search prior resolved tickets for similar cases
`draft_response`	Persist the drafted reply to the ticket system
`search_knowledge_base`	RAG search against your documentation
`escalate_to_human`	Route ticket to human queue with context and draft
`send_reply`	Send the final approved reply to the customer

The send_reply tool must sit behind a gate — either human approval or a passing confidence check. Never wire it to fire automatically from the response agent.

Metrics

Track these five numbers. They tell you whether the system is working and where it is breaking.

Average response time. Ticket creation to first meaningful reply. A working agent system should push P3 tickets below five minutes.

Resolution rate by category. Your overall autonomous resolution rate might be 65%, but billing might be 90% and legal might be 5%. That breakdown tells you where to invest.

Escalation rate by reason. If the top reason is "confidence below threshold," your knowledge base has gaps. If it is "refund requests," your autonomous refund limit is probably too low.

Customer satisfaction (CSAT). Compare autonomous-resolution CSAT against human-handled CSAT. If autonomous is significantly lower, the response quality needs work — not the automation concept.

First contact resolution rate. Does the customer follow up, or does the conversation end? High FCR means the agent is actually solving problems.

Manual vs. AI-Assisted vs. Fully Autonomous

Dimension	Manual Support	AI-Assisted	Fully Autonomous
Average response time	2-8 hours	30-60 minutes	Under 5 minutes
Cost per resolution	High	Medium	Low
Consistency	Variable	High	High
Best for	Legal, complex, high-value	Technical, billing, account	FAQs, password resets, order status
Scalability	Poor	Good	Excellent

Start AI-assisted across all categories. Identify which categories have the highest autonomous confidence. Graduate those to fully autonomous. Keep humans in the loop for the rest. Do not try to automate everything on day one.

Common Pitfalls

Hallucinating policies. This is the most damaging failure mode. An agent that invents a refund policy or misquotes a feature's behavior is worse than no agent at all. The fix is enforced RAG — the response agent must cite retrieved sources and the system prompt must explicitly prohibit stating anything about policies not present in retrieved chunks.

Being too robotic. Agents that respond with perfectly grammatical template-speak produce replies that feel evasive and cold. Include explicit tone guidance in the system prompt: write like a knowledgeable colleague, not a ticket template. Get to the answer faster. Avoid filler phrases.

Not knowing when to escalate. An agent that always generates a confident-sounding response without escalating is more dangerous than one that escalates too aggressively. Your escalation logic must be explicit rules in code. Do not leave it to the agent to decide when it is out of its depth.

Ignoring conversation history. A customer following up on an open ticket should not be treated as a new contact. The response agent must receive the full conversation thread. This is a data architecture problem — make sure query_tickets returns prior messages in the same thread.

Skipping the follow-up agent. A ticket marked resolved is not actually resolved if the customer did not confirm it. Check 24 hours after first resolution. If there is no confirmation and the issue is not closed, re-trigger the pipeline.

Key Takeaways

Customer support works as a first AI agent deployment because the task space is bounded, errors are recoverable, and the economics are demonstrable from day one.

Build a pipeline, not a single agent: triage, draft response, human review gate, send reply, follow-up. Each stage has a narrow job. Context accumulates through the chain.

Triage must produce a structured classification with explicit escalation flags. The rest of the pipeline depends on this being accurate.

The response agent must use RAG. Every policy statement in a reply must be grounded in retrieved documentation, not the model's training data.

Escalation rules are code, not judgment calls. Hard rules for legal, data breach, and high-value accounts. Confidence thresholds for the gray area.

Start AI-assisted. Graduate categories to fully autonomous as confidence data accumulates. Track resolution rate by category, escalation rate by reason, and CSAT for autonomous versus human-handled tickets — those three numbers tell you where the system needs investment.