AI University/Docs

Skills Library: Give Your Agents Superpowers

A skill is not documentation — it is a capability. Learn how the Skills Library works, how agents invoke skills via the run_skill_script MCP tool, and how to build your own skills using the standard pattern.

Last updated: 2026-03-02

Skills Library: Give Your Agents Superpowers

A skill is not a tutorial. It is not a reference sheet. It is not a cheat sheet of API calls. A skill is an executable capability — a Python script that an agent can invoke to perform a computation it could not do reliably through language alone.

When an agent needs to score a lead, it does not reason through a scoring rubric in its head and produce a number. It calls the lead_scoring skill, passes in the contact data as JSON, and gets back a structured score with the reasoning logged. The computation is reproducible, testable, and fast.

This is the core idea behind the Skills Library: move computations that require precision, consistency, or speed out of the language model and into deterministic Python scripts — while keeping the agent in control of when and how to invoke them.

What a Skill Is

A skill has three components:

1. SKILL.md — a meta prompt that describes what the skill does, when to use it, what inputs it expects, and what outputs it returns. The agent reads this to understand whether the skill is appropriate for the current task.

2. scripts/ — one or more Python scripts that perform the actual computation. Scripts use only the Python standard library (no external dependencies), accept JSON input via stdin or arguments, and return JSON to stdout. This makes them fast, portable, and easy to test.

3. catalog.json — a registry entry that describes the skill's name, description, input schema, and script path. The MCP server reads this to make skills discoverable.

The agent never writes Python. It reads the SKILL.md to understand the skill, then calls the run_skill_script MCP tool to execute it.

The File Structure

Skills live in .claude/skills/ within your project. Each skill gets its own directory following this pattern:

.claude/skills/
  lead-scoring-engine/
    SKILL.md
    scripts/
      score_lead.py
      validate_input.py
    catalog.json
  churn-predictor/
    SKILL.md
    scripts/
      predict_churn.py
    catalog.json
  send-timing-optimizer/
    SKILL.md
    scripts/
      optimize_timing.py
    catalog.json

The SKILL.md Pattern

The SKILL.md is what the agent reads. It tells the agent what the skill does in precise, unambiguous terms. Think of it as the tool description for a Python script.

A good SKILL.md includes:

  • What the skill computes and why that computation matters
  • When to use this skill (and when not to)
  • The exact input schema with field descriptions and types
  • The exact output schema with field descriptions
  • Example input/output pair
# Lead Scoring Engine

## What This Skill Does
Scores a contact on a 0-100 scale based on firmographic, behavioral, and engagement signals.
Use this skill when you need a numeric lead score to prioritize outreach or trigger automation.

## When to Use
- Before assigning a lead to a sales rep
- When deciding whether to add a contact to a high-touch sequence
- When ranking a batch of leads for prioritization

## Do NOT Use When
- The contact has fewer than 3 data points (score will be unreliable — flag for enrichment instead)
- You need a qualitative assessment rather than a score (use the research-synthesizer agent)

## Input Schema
```json
{
  "contact": {
    "email": "string (required)",
    "company_size": "number (optional) — employees",
    "industry": "string (optional)",
    "title": "string (optional)",
    "website_visits_30d": "number (optional)",
    "email_opens_30d": "number (optional)",
    "last_activity_date": "string (optional) — ISO 8601"
  }
}

Output Schema

{
  "score": "number — 0 to 100",
  "tier": "string — 'hot' | 'warm' | 'cold'",
  "signals": "array — list of signals that influenced the score",
  "missing_data": "array — fields that would improve score reliability",
  "confidence": "string — 'high' | 'medium' | 'low'"
}

Example

Input: {"contact": {"email": "cto@acme.com", "company_size": 500, "website_visits_30d": 12}} Output: {"score": 78, "tier": "hot", "signals": ["enterprise_size", "high_web_engagement"], "missing_data": ["title", "industry"], "confidence": "medium"}


## How Agents Discover and Run Skills

The `run_skill_script` MCP tool is the bridge between the agent and the skill scripts. The MCP server reads `catalog.json` from each skill directory to build its tool registry. When an agent needs a computation, it calls `run_skill_script` with the skill name and input data.

```typescript
// MCP tool signature
{
  name: "run_skill_script",
  description: "Execute a computational skill script and return the result as JSON. Use this when you need a precise, repeatable computation — scoring, prediction, analysis, optimization — rather than reasoning through the answer yourself.",
  parameters: {
    skill_name: "string — the name of the skill from the catalog",
    input: "object — the input data matching the skill's input schema",
    script: "string (optional) — specific script name if the skill has multiple scripts"
  }
}

The tool executor looks up the skill in catalog.json, validates the input against the schema, runs the Python script as a subprocess with the input passed as JSON, and returns the stdout as a parsed JSON object.

// What the agent call looks like in practice
await runSkillScript({
  skill_name: "lead-scoring-engine",
  input: {
    contact: {
      email: "cto@acme.com",
      company_size: 500,
      title: "Chief Technology Officer",
      website_visits_30d: 12,
      email_opens_30d: 4
    }
  }
});
// Returns: { score: 82, tier: "hot", signals: [...], confidence: "high" }

The catalog.json Pattern

Each skill has a catalog.json that registers it with the MCP server. The MCP server loads all catalog.json files at startup to build the skill registry.

{
  "name": "lead-scoring-engine",
  "version": "1.0.0",
  "description": "Scores contacts on a 0-100 scale using firmographic and behavioral signals.",
  "tags": ["scoring", "leads", "prioritization"],
  "scripts": {
    "main": "scripts/score_lead.py",
    "validate": "scripts/validate_input.py"
  },
  "input_schema": {
    "type": "object",
    "required": ["contact"],
    "properties": {
      "contact": { "type": "object" }
    }
  },
  "output_schema": {
    "type": "object",
    "properties": {
      "score": { "type": "number" },
      "tier": { "type": "string" },
      "signals": { "type": "array" },
      "confidence": { "type": "string" }
    }
  }
}

The 16 Built-In Skills

The Skills Library ships with 16 computational skills covering the most common agent workloads in a growth and operations system.

SkillDescription
lead-scoring-engineScores contacts 0-100 using firmographic, behavioral, and engagement signals
churn-predictorPredicts customer churn probability using activity decay and engagement patterns
a-b-test-analyzerCalculates statistical significance, confidence intervals, and winner determination for A/B experiments
send-timing-optimizerDetermines the optimal send time for email or message based on historical engagement data
trend-detectionIdentifies statistically significant trends and anomalies in time-series data
cross-agent-intelligenceAggregates signals across multiple agent sessions to surface cross-session patterns
adaptive-feedback-loopAdjusts agent behavior parameters based on outcome feedback over time
data-prunerIdentifies and flags stale, duplicate, or low-quality records for cleanup
linkedin-prospectorScores and qualifies LinkedIn profile data against ideal customer profile criteria
cohort-analyzerGroups users into cohorts and computes retention, engagement, and value metrics per cohort
revenue-attributionAttributes revenue to touchpoints using first-touch, last-touch, and linear attribution models
persona-classifierClassifies contacts into buyer persona segments based on title, company, and behavioral signals
email-health-scorerScores email list health using bounce rates, engagement decay, and list hygiene signals
content-performance-rankerRanks content pieces by predicted performance based on historical engagement patterns
sequence-optimizerRecommends optimal step order and timing for outreach sequences based on conversion data
forecast-modelerGenerates simple forecast projections using historical data with confidence bands

A Skill Script in Practice

Here is what a real skill script looks like — the send-timing-optimizer:

#!/usr/bin/env python3
"""
Send Timing Optimizer
Input: historical engagement data for a contact or segment
Output: optimal send time with confidence score
"""

import json
import sys
from datetime import datetime, timedelta
from collections import defaultdict

def optimize_timing(data: dict) -> dict:
    engagement_history = data.get("engagement_history", [])
    timezone = data.get("timezone", "UTC")

    if not engagement_history:
        return {
            "optimal_hour": 10,
            "optimal_day": "Tuesday",
            "confidence": "low",
            "reasoning": "No engagement history available. Defaulting to industry baseline.",
            "alternatives": [{"hour": 14, "day": "Wednesday"}]
        }

    # Count engagements by hour and day
    hour_counts = defaultdict(int)
    day_counts = defaultdict(int)

    for event in engagement_history:
        ts = datetime.fromisoformat(event["timestamp"].replace("Z", "+00:00"))
        hour_counts[ts.hour] += event.get("weight", 1)
        day_counts[ts.strftime("%A")] += event.get("weight", 1)

    best_hour = max(hour_counts, key=hour_counts.get)
    best_day = max(day_counts, key=day_counts.get)

    total = sum(hour_counts.values())
    confidence = "high" if total >= 20 else "medium" if total >= 5 else "low"

    # Find top 2 alternative hours
    sorted_hours = sorted(hour_counts.items(), key=lambda x: x[1], reverse=True)
    alternatives = [{"hour": h, "day": best_day} for h, _ in sorted_hours[1:3]]

    return {
        "optimal_hour": best_hour,
        "optimal_day": best_day,
        "confidence": confidence,
        "data_points": total,
        "reasoning": f"Based on {total} engagement events. Peak activity at hour {best_hour}.",
        "alternatives": alternatives
    }

if __name__ == "__main__":
    input_data = json.loads(sys.stdin.read())
    result = optimize_timing(input_data)
    print(json.dumps(result))

Notice: pure stdlib, JSON in via stdin, JSON out via stdout, deterministic output, no side effects. This is the pattern every skill script follows.

How to Create Your Own Skill

Creating a new skill takes four steps:

Step 1: Create the directory

.claude/skills/your-skill-name/
  SKILL.md
  scripts/
    your_script.py
  catalog.json

Step 2: Write the Python script. Accept JSON from stdin, process it using only standard library modules, return JSON to stdout. Include error handling that returns {"error": "...", "message": "..."} on failure.

Step 3: Write the SKILL.md. Describe what the skill does, when to use it, the input schema, output schema, and an example. Be precise — the agent reads this to decide whether to invoke the skill.

Step 4: Register in catalog.json. Add the skill name, description, script paths, and input/output schemas.

The MCP server will pick up the new skill on next restart. Test it directly from the command line before relying on an agent to invoke it:

echo '{"contact": {"email": "test@example.com"}}' | python3 .claude/skills/lead-scoring-engine/scripts/score_lead.py

If the script returns clean JSON from the command line, it will work when the agent calls it through run_skill_script.

Next Steps

Skills give your agents computational capabilities. Tools give them access to external systems. The two work together — an agent might call a tool to fetch raw data, then invoke a skill to process it. See Tool Design for how to build the tool side of that pattern, and Architecture Patterns for how to wire agents, tools, and skills into a coherent system.