Custom MCP Servers

What MCP is and why you need a custom server

MCP (Model Context Protocol) is Anthropic's open standard for connecting AI models to external tools and data sources. Think of it as USB-C for AI — a universal interface that any model can use to talk to any tool, regardless of who built it. Claude Code supports MCP natively over stdio, which means you can build a server that exposes your own tools and Claude Code will discover and use them automatically.

Out of the box, Claude Code comes with built-in tools: read files, edit files, run bash commands, search with grep, find with glob. Those are powerful for coding tasks. But what happens when your agent needs to query your database? Send an email? Create a discount code in your payment system? Publish a post to LinkedIn? Search Reddit for competitor mentions?

That is where a custom MCP server comes in. You build a server that wraps your business systems, expose them as tools with typed schemas and clear descriptions, and Claude Code treats them exactly like its built-in tools. The agent sees send_email right alongside read_file and decides when to use each one.

There are four reasons to build a custom server instead of just letting the agent run bash commands against your APIs:

Security boundary. The LLM never sees your API keys or database credentials directly. The MCP server holds those secrets and mediates access.
Per-agent permissions. You can control which agents have access to which tools. Your content agent should not be able to send emails. Your outreach agent should not be able to publish blog posts. Allowlists enforce this at the server level.
Guardrails. You can validate every tool call before it executes. Reject invalid emails. Cap discount percentages. Block dangerous operations. This is deterministic safety — not hoping the LLM remembers the rules.
Logging and audit trails. Every tool call gets logged with the agent ID, arguments, result, and duration. When something goes wrong at 3am, you can trace exactly what happened.

Anatomy of an MCP server

An MCP server has four parts: the SDK, the transport, tool registration, and the handler function. Here is what each one does and how they fit together.

@modelcontextprotocol/sdk is the official TypeScript SDK from Anthropic. It handles the protocol negotiation, message framing, and tool discovery. You import McpServer from it and use it to define your tools.

StdioServerTransport is the communication layer. MCP over stdio means the server reads from stdin and writes to stdout — no HTTP, no ports, no networking. Claude Code spawns your server as a child process and pipes messages back and forth. This is simple, fast, and avoids the complexity of running a web server.

Tool registration is where you define what tools exist. Each tool gets a name, a description (this is what the LLM reads to decide when to use it), and a Zod schema for its parameters.

Handler function is the actual implementation. It receives validated arguments and returns content.

Here is the smallest possible MCP server:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({ name: "my-tools", version: "1.0.0" });

server.tool(
  "greet",
  "Say hello to someone",
  { name: z.string() },
  async ({ name }) => ({
    content: [{ type: "text", text: `Hello, ${name}!` }],
  }),
);

const transport = new StdioServerTransport();
await server.connect(transport);

That is a working MCP server. It exposes one tool called greet that takes a name parameter and returns a greeting. When Claude Code connects to it, the agent will see greet in its tool list and can call it whenever it decides to.

The server.tool() call is the core API. The first argument is the tool name (what the agent calls it by). The second is the description (the agent reads this to decide when to use it — make it good). The third is the Zod schema for input validation. The fourth is the async handler that does the work and returns content.

The return format is always { content: [{ type: "text", text: "..." }] }. The content array can have multiple items, but for most tools, a single text item with JSON-stringified data is what you want.

Connecting to Claude Code

To make Claude Code use your MCP server, you create a configuration file that tells it how to spawn the server process. This is mcp-config.json:

{
  "mcpServers": {
    "ai-uni-tools": {
      "command": "node",
      "args": ["node_modules/.bin/tsx", "mcp-server.ts"],
      "cwd": "/path/to/project",
      "env": {
        "NODE_PATH": "/path/to/project/node_modules"
      }
    }
  }
}

The key fields:

command — the executable to run. Here we use node because the server is TypeScript compiled via tsx.
args — the arguments to pass. We point at tsx in node_modules and then the server file.
cwd — the working directory for the server process. This matters for resolving relative imports.
env — environment variables passed to the server. This is where you inject things like NODE_PATH, API keys, and crucially, the AGENT_ID.

When you run Claude Code, you connect it to your MCP server with the --mcp-config flag:

claude -p "Find qualified leads and send outreach emails" \
  --mcp-config ./mcp-config.json

Claude Code reads the config, spawns your MCP server as a child process, negotiates the protocol handshake over stdio, discovers all the tools you registered, and makes them available to the agent. From the agent's perspective, your custom tools are indistinguishable from built-in ones.

The AGENT_ID environment variable is a pattern worth highlighting. When you spawn agents from an orchestrator, each one gets a different AGENT_ID in its environment:

const proc = spawn("claude", claudeArgs, {
  env: { ...cleanEnv, AGENT_ID: "outreach" },
});

The MCP server reads this at startup with process.env.AGENT_ID and uses it to determine which tools this particular agent is allowed to use. More on that in the allowlists section.

The full 52-tool MCP server

A minimal server with one tool is nice for demos. Here is what a production MCP server looks like. This is the actual server that powers AI University's 15-agent system with 52 tools:

// mcp-server.ts — the production server
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import { getToolDefinitions, executeTool, getAllowedTools } from "./src/lib/agent-sdk/tools";
import { runPreCheck, runPostCheck } from "./src/lib/agent-sdk/tools/guardrails";
import { logActivity } from "./src/lib/agent-sdk/activity-log";

const agentId = process.env.AGENT_ID || "unknown";
const allowedToolNames = getAllowedTools(agentId);
const allTools = getToolDefinitions().filter((t) => allowedToolNames.includes(t.name));

const server = new McpServer({ name: "ai-uni-tools", version: "1.0.0" });

The first thing the server does is read the AGENT_ID from the environment and filter the tool list. If you are the outreach agent, you might see 18 tools. If you are the content engine, you see 14. The agent never even knows the other tools exist — they are not registered.

Tool definitions live in separate files as plain objects. Each tool has a name, description, and input_schema (a JSON Schema object). The getToolDefinitions() function collects them all into an array. The executeTool() function takes a tool name and arguments and dispatches to the right handler.

This separation matters. Tool definitions are data, not code. You can inspect them, test them, validate their schemas, and generate documentation from them. The execution logic is separate, which makes it easy to add middleware (like guardrails and logging) around every call.

The tool registration loop looks like this:

for (const tool of allTools) {
  server.tool(tool.name, tool.description, tool.zodSchema, async (args) => {
    const start = Date.now();

    // Pre-check: block if violated
    const preCheck = runPreCheck(tool.name, args);
    if (!preCheck.ok) {
      return { content: [{ type: "text", text: JSON.stringify({ error: preCheck.reason }) }] };
    }

    // Execute the tool
    const result = await executeTool(tool.name, args);

    // Post-check: warn but don't block
    const warnings = runPostCheck(tool.name, args);
    const finalResult = warnings.length > 0
      ? { ...result, warnings }
      : result;

    // Log everything
    const duration = Date.now() - start;
    logActivity({ agentId, tool: tool.name, args, result: finalResult, duration, success: true });

    return { content: [{ type: "text", text: JSON.stringify(finalResult) }] };
  });
}

Every tool call follows the same pipeline: pre-check, execute, post-check, log. This is where the real value of a custom server emerges. You are not just exposing tools — you are wrapping them in a safety and observability layer that runs deterministically on every call.

Per-agent tool allowlists

This is the security model. Every agent gets a subset of the available tools, and the filtering happens at server startup — before the agent ever sees a tool list.

Here is the real allowlist from the production system:

// Per-agent tool permissions — allowlist.ts
export function getAllowedTools(agentId: string): string[] {
  const shared = [
    "emit_event", "read_events", "get_awareness",
    "save_skill", "load_skills", "rate_skill", "run_skill_script",
    "save_memory", "load_memory",
    "post_to_talkspace", "read_talkspace",
  ];

  switch (agentId) {
    case "outreach":
      return [...shared, "query_visitors", "send_email",
        "save_outreach", "enrich_lead", "create_discount_code",
        "promote_visitor", "build_dossier"];

    case "content-engine":
      return [...shared, "save_content_draft", "get_brand_context",
        "publish_to_linkedin", "check_linkedin_status"];

    case "competitor-watch":
      return [...shared, "fetch_url", "search_web",
        "save_competitive_intel", "search_reddit",
        "search_twitter", "search_github_trending"];

    // ... 12 more agents, each with specific tools
    default:
      return shared;
  }
}

The design has two layers:

Shared tools are available to every agent. These are the coordination primitives — emitting events, reading events, saving and loading memory, posting to the shared talkspace, and working with skills. Every agent needs these to participate in the multi-agent system.

Agent-specific tools are only available to the agent that needs them. The outreach agent can send emails, enrich leads, and create discount codes. The content engine can save drafts and publish to LinkedIn. The competitor-watch agent can search the web and social media.

The principle is least privilege. If an agent gets confused or hallucinates a plan that involves sending emails, it literally cannot do it unless it is the outreach agent. The tool does not exist in its tool list. The agent will see an error if it tries to call a tool that was not registered for it, and it will have to find another approach.

This is fundamentally different from putting "do not send emails" in the system prompt. System prompt instructions are suggestions — the model might follow them, or it might not if it reasons itself into a different plan. An allowlist is a hard boundary. The tool is simply not available. There is nothing to reason around.

A few practical notes on building allowlists:

Start restrictive and add tools as agents prove they need them. It is easier to expand permissions than to clean up after a tool was misused.
The default case should return only the shared tools. If an unknown AGENT_ID connects, it gets minimum permissions.
Review your allowlists regularly. As agents evolve, their tool needs change. An agent that used to need fetch_url might not anymore.

Pre-check and post-check guardrails

Allowlists control which tools an agent can call. Guardrails control how it can call them. They validate the arguments before execution (pre-checks) and inspect the results after execution (post-checks).

Pre-checks: hard blocks

A pre-check runs before the tool executes. If it fails, the tool call is rejected and the agent gets an error message explaining what was wrong. The tool never runs. The email never sends. The discount code never gets created.

// Pre-check: BLOCKS the tool call if violated — guardrails.ts
export function runPreCheck(toolName: string, args: Record<string, unknown>): PreCheckResult {
  switch (toolName) {
    case "send_email": {
      const to = String(args.to || "");
      if (!EMAIL_REGEX.test(to)) {
        return { ok: false, reason: `Invalid email address: "${to}"` };
      }
      const html = String(args.htmlBody || args.body || "");
      if (!html.toLowerCase().includes("unsubscribe")) {
        return { ok: false, reason: "Email body MUST contain an unsubscribe link" };
      }
      return { ok: true };
    }

    case "create_discount_code": {
      const pct = Number(args.percent_off || 0);
      if (pct < 5 || pct > 25) {
        return { ok: false, reason: `Discount must be 5-25%. Got: ${pct}%` };
      }
      return { ok: true };
    }
  }
}

Look at what this prevents:

An agent cannot send an email to a malformed address. If it constructs "john smith" instead of "john@company.com", the pre-check catches it.
An agent cannot send an email without an unsubscribe link. This is a CAN-SPAM requirement. You do not want to rely on the LLM remembering this — you enforce it in code.
An agent cannot create a 50% discount code. The business rule says 5-25%. If the model gets enthusiastic and tries to offer a huge discount, the guardrail blocks it.

When a pre-check fails, the agent sees the error reason and can fix its approach. It will typically correct the argument and retry. This is the self-healing loop — deterministic validation plus LLM reasoning produces reliable behavior.

Post-checks: soft warnings

A post-check runs after the tool executes successfully. It does not block anything. Instead, it appends warnings to the result. The agent sees the warnings and can decide what to do about them.

// Post-check: WARNS but doesn't block — guardrails.ts
export function runPostCheck(toolName: string, args: Record<string, unknown>): PostCheckWarning[] {
  const warnings: PostCheckWarning[] = [];

  switch (toolName) {
    case "save_content_draft": {
      const content = String(args.content || "");
      if (content.length < 100) {
        warnings.push({ warning: "Content is very short. Consider adding more substance." });
      }
      break;
    }
    case "save_ad_copy": {
      const headline = String(args.headline || "");
      const platform = String(args.platform || "google");
      const limits: Record<string, number> = { google: 30, linkedin: 70, facebook: 40 };
      const limit = limits[platform] || 50;
      if (headline.length > limit) {
        warnings.push({ warning: `Headline exceeds ${platform} limit of ${limit} chars.` });
      }
      break;
    }
  }

  return warnings;
}

Post-checks are for things that are probably wrong but not catastrophically wrong:

A content draft that is only 40 characters long is probably not ready to publish, but maybe the agent is saving a placeholder and plans to expand it later.
A headline that exceeds the platform character limit will get truncated by the ad platform. The agent should know this so it can revise, but the save itself should still succeed.

The philosophy behind splitting guardrails into pre and post:

Pre-checks are for hard rules. Things that must never happen. Invalid data, policy violations, safety constraints. These block execution.
Post-checks are for soft rules. Things that indicate a potential problem. Quality issues, best practice violations, warnings about downstream effects. These inform but do not block.
Both are deterministic. They run as regular code, not as LLM calls. They are fast, predictable, and testable. You can unit test your guardrails just like any other function.

This is a critical point. You are not relying on the LLM to "remember" that emails need unsubscribe links or that discounts should be capped at 25%. You are enforcing those rules in code that runs on every single tool call regardless of what the model thinks. The model's job is to decide what to do. The guardrails' job is to make sure what it does is safe.

Activity logging

Every tool call that flows through the MCP server gets logged. This is not optional — when you have 15 agents running autonomously, you need to know exactly what happened, when, and why.

// Every tool call gets logged — mcp-server.ts
logActivity({
  agentId,
  tool: tool.name,
  args,
  result: finalResult,
  duration,
  success,
});

The log entry captures:

agentId — which agent made the call. Was it outreach? Content engine? Competitor watch?
tool — which tool was called. send_email, query_visitors, save_content_draft.
args — what arguments were passed. The full argument object.
result — what was returned. The full result including any post-check warnings.
duration — how long the tool call took in milliseconds.
success — whether the call completed without throwing.

This writes to JSONL (one JSON object per line), which is trivially parseable and appendable. Each line is a complete record. You can grep through it, pipe it into analytics, or load it into a dashboard.

Why this matters in practice:

Debugging. When an agent does something unexpected, you pull up the activity log and read the exact sequence of tool calls it made. You see the arguments it constructed, the results it received, and the decisions it made. No guessing.

Compliance. If you are sending emails, creating discount codes, or publishing content, you need an audit trail. The activity log is that trail. You can answer "who sent this email and why?" by tracing back through the log.

Performance tuning. The duration field tells you which tools are slow. If enrich_lead takes 8 seconds on every call, you know where to optimize. If query_visitors is being called 50 times per agent run, you know the agent might need a batch version of the tool.

Pattern detection. Over time, the log reveals how agents actually use tools. You might discover that the outreach agent never calls build_dossier — maybe that tool is unnecessary. Or that the content engine calls get_brand_context on every single run — maybe that should be injected into its system prompt instead.

Design principles for MCP servers

After building and operating a 52-tool server with 15 agents, here are the principles that held up in production.

Filter at startup, not at call time

Do not check permissions on every tool call. Filter the tool list when the server starts and only register the tools the agent is allowed to use. This is simpler, faster, and more secure. The agent never even sees tools it cannot use, which means it cannot waste tokens reasoning about them or attempting to call them.

// Good: filter once at startup
const allowedToolNames = getAllowedTools(agentId);
const allTools = getToolDefinitions().filter((t) => allowedToolNames.includes(t.name));

// Bad: check on every call
server.tool("send_email", ..., async (args) => {
  if (!isAllowed(agentId, "send_email")) return { error: "not allowed" };
  // ...
});

The first approach means the tool literally does not exist for unauthorized agents. The second approach means the tool exists, the agent can see it, might try to call it, burns tokens on the attempt, and gets an error. Startup filtering is strictly better.

Pre-checks for hard rules, post-checks for soft rules

This was covered above, but the principle bears repeating as a design rule. If a violation would cause real damage (bad email address, excessive discount, missing legal requirement), make it a pre-check that blocks execution. If a violation is a quality concern (short content, long headline), make it a post-check that warns.

Do not mix these up. If you make everything a pre-check, agents will struggle to get anything done — they will hit walls on minor quality issues. If you make everything a post-check, you lose your safety guarantees. The split is the design.

Log everything

You will need the logs. You will need them when an agent sends a weird email. You will need them when a discount code appears that nobody remembers creating. You will need them when debugging why an agent ran for 25 turns and accomplished nothing. Log every tool call, every argument, every result, every duration. Disk is cheap. Debugging blind is expensive.

Keep tools focused

One tool does one thing. Do not build a tool called manage_contacts that creates, updates, deletes, and searches contacts based on an action parameter. Build create_contact, update_contact, delete_contact, and search_contacts. Focused tools have simpler schemas, clearer descriptions, and the LLM picks the right one more reliably.

The temptation to build Swiss Army knife tools is strong because it feels like fewer tools means less complexity. In practice, it means more complex schemas, longer descriptions, and more opportunities for the model to pass the wrong combination of parameters. Simple tools compose better than complex ones.

Use environment variables for identity

Pass agent identity through the environment, not through tool arguments. If you add an agent_id parameter to every tool, you are trusting the LLM to pass the correct value on every call. It might hallucinate a different agent ID. It might forget to pass it. Environment variables are set once at process startup and cannot be changed by the model.

// Good: identity comes from the environment
const agentId = process.env.AGENT_ID || "unknown";

// Bad: identity comes from the model
server.tool("send_email", ..., async ({ agentId, to, body }) => {
  // the model could pass any agentId it wants
});

Write descriptions like contracts

The tool description is the single most important thing you write. The LLM reads it to decide when to use the tool, what to pass, and what to expect. A lazy description produces lazy tool usage. A precise description produces precise tool usage. Spend the time. Include when to use the tool, when not to, what the constraints are, and what the side effects are. This is covered in depth in the tool design guide, but it is worth reiterating here because it is the most common mistake in MCP server development.

Next steps

You now have the full picture of how to build a custom MCP server: the SDK and transport layer, tool registration with Zod schemas, connecting to Claude Code via mcp-config.json, per-agent allowlists for least-privilege access, pre-check and post-check guardrails for deterministic safety, and activity logging for observability.

The next step is putting multiple agents together into a coordinated system. See Multi-Agent Orchestration for how to spawn agents in parallel, coordinate them through events and shared memory, and handle scheduling, failures, and monitoring across a fleet of autonomous agents.

If you want to go deeper on tool design specifically — naming conventions, parameter schemas, return value patterns, and error handling — see Designing Tools for AI Agents.