skilltest

Testing harness for AI agent skills and plugins

Define skill contracts in YAML. Test inputs, outputs, side effects, idempotency, and token overhead in isolation.

Why This Exists

AI agent skills and plugins are hard to test. They interact with tools, read/write files, call APIs, mutate memory, and produce non-deterministic outputs. Manual testing is slow, incomplete, and doesn't scale.

skilltest solves this with a contract-driven testing approach:

Define what your skill does (inputs, outputs, side effects) in a YAML contract
Mock the agent environment (tools, filesystem, memory, network) so tests run in isolation
Validate outputs against JSON Schema, track every side effect, enforce constraints
Repeat the same scenario multiple times to catch non-determinism
Compare multiple skill implementations against the same contract

This is not MCP protocol testing (use mcptest for that). This is business logic testing -- does your skill produce the right outputs for given inputs?

Quick Start

Install

npm install skilltest

1. Create a skill contract

npx skilltest init my-skill

This creates my-skill.skill.yaml:

name: "my-skill"
version: "1.0.0"
description: "TODO: describe what my-skill does"

inputs:
  - name: query
    type: string
    description: "The input query to process"
    required: true

outputs:
  - name: result
    type: object
    description: "The processed result"

scenarios:
  - name: basic-query
    inputs:
      query: "hello world"
    expectedOutputs:
      result:
        processed: true
        data: "hello world"
    tags:
      - smoke

2. Write your skill executor

import type { SkillExecutor } from 'skilltest';

export const execute: SkillExecutor = async (inputs, context) => {
  const query = inputs.query as string;

  // Use tools from the agent context
  const searchResult = await context.callTool('search', { query });

  // Read/write files
  const config = await context.readFile('config.json');

  // Access memory
  context.setMemory('lastQuery', query);

  return {
    result: {
      processed: true,
      data: query,
    },
  };
};

3. Run tests

npx skilltest run -c my-skill.skill.yaml -e ./my-executor.js

Output:

  skilltest results: 1 scenarios
  ========================================

  [PASS] my-skill / basic-query (12ms)

  ----------------------------------------
  Total: 1 | Passed: 1 | Failed: 0
  Status: PASSED

Contract Format

Skill contracts are YAML or JSON files that define the complete testing specification.

Full Contract Reference

# Required fields
name: "skill-name"
version: "1.0.0"
description: "What this skill does"

# Input definitions with optional JSON Schema
inputs:
  - name: query
    type: string           # string, number, integer, boolean, array, object, null, any
    description: "The search query"
    required: true          # default: true
    schema:                 # optional: full JSON Schema for deep validation
      type: string
      minLength: 1
      maxLength: 10000
  - name: maxResults
    type: integer
    required: false
    default: 10            # used when input not provided

# Output definitions with optional JSON Schema
outputs:
  - name: results
    type: array
    description: "Search results"
    schema:
      type: array
      items:
        type: object
        properties:
          title: { type: string }
          score: { type: number }

# Declared side effects (undeclared effects = test failure)
sideEffects:
  - type: tool_call        # tool_call, file_write, file_read, file_delete,
    description: "Calls the search tool"  # network_request, env_mutation,
    idempotent: true        # process_spawn, memory_write
  - type: file_write
    description: "Caches results"
    target: "cache/*"       # glob pattern for allowed targets

# Constraints
constraints:
  - type: max_tokens        # max token budget
    value: 5000
  - type: max_duration_ms   # execution time limit
    value: 10000
  - type: max_file_writes   # file write count limit
    value: 3
  - type: max_network_calls # network call count limit
    value: 5
  - type: requires_tool     # must use this tool
    value: "search"
  - type: no_side_effects   # pure function, no writes/mutations
    value: true
  - type: deterministic     # same input = same output (checked via idempotency)
    value: true

# Test scenarios
scenarios:
  - name: basic-search
    description: "Simple search query"
    inputs:
      query: "hello world"
      maxResults: 5
    expectedOutputs:        # exact value matching
      results:
        - title: "Hello World Guide"
          score: 0.95
    expectedSideEffects:    # verify specific effects occurred
      - type: tool_call
        target: "search"
        count: 1
    tags:                   # for filtering with --tag
      - smoke
      - search

  - name: empty-results
    inputs:
      query: "nonexistent"
    expectedOutputs:
      results: []
    tags:
      - edge-case

  - name: invalid-input
    inputs:
      query: null
    shouldFail: true        # test passes if execution throws
    tags:
      - error

# Optional metadata
metadata:
  author: "your-name"
  repository: "your/repo"

CLI Reference

`skilltest run`

Run skill tests against a contract.

skilltest run -c contract.yaml [options]

Options:
  -c, --contract <path>    Path to skill contract (required)
  -e, --executor <path>    Path to executor module (default: pass-through)
  -s, --scenario <name>    Run only a specific scenario
  -t, --tag <tags...>      Filter scenarios by tags
  --idempotency <runs>     Run idempotency check with N runs
  --tokens                 Measure token usage
  -o, --output <path>      Write report to file
  -f, --format <format>    Output format: text, json, markdown (default: text)
  --timeout <ms>           Execution timeout in milliseconds
  -v, --verbose            Verbose output with token details

Examples:

# Run all scenarios
skilltest run -c my-skill.yaml -e ./executor.js

# Run only smoke tests
skilltest run -c my-skill.yaml -e ./executor.js -t smoke

# Check idempotency with 5 runs
skilltest run -c my-skill.yaml -e ./executor.js --idempotency 5

# JSON output with token measurement
skilltest run -c my-skill.yaml -e ./executor.js --tokens -f json -o results.json

# Markdown report
skilltest run -c my-skill.yaml -e ./executor.js -f markdown -o report.md

`skilltest validate`

Validate a contract file for correctness without running tests.

skilltest validate -c contract.yaml [options]

Options:
  -c, --contract <path>    Path to skill contract (required)
  --strict                 Enable strict validation (warnings become errors)

Checks performed:

YAML/JSON syntax
Required fields present
Input/output name uniqueness
Scenario coverage of required inputs
ExpectedOutput keys match declared outputs
Constraint compatibility (e.g., no_side_effects vs max_file_writes)
Side effect declaration consistency
Strict mode: descriptions, version format, expected outputs

`skilltest report`

Aggregate and format results from JSON output files.

skilltest report <files...> [options]

Options:
  -o, --output <path>      Write report to file
  -f, --format <format>    Output format: text, json, markdown

`skilltest init`

Create a new contract from a template.

skilltest init <name> [options]

Options:
  -o, --output <path>      Output file path
  -t, --template <type>    Template: basic, advanced, minimal (default: basic)

Templates:

minimal -- Bare minimum contract with one scenario
basic -- Standard contract with 3 scenarios, side effects, constraints
advanced -- Full-featured contract with JSON Schema, tags, metadata, Unicode test

Programmatic API

Use skilltest as a library for custom test runners.

import {
  parseContractString,
  validateContract,
  MockAgentContext,
  runContract,
  runScenario,
  checkIdempotency,
  measureTokens,
  formatResultsText,
  formatResultsJson,
  formatResultsMarkdown,
} from 'skilltest';
import type { SkillExecutor, SkillContract } from 'skilltest';

// Parse and validate a contract
const contract = parseContractString(yamlContent);
const validation = validateContract(contract, true); // strict mode

// Create a mock context with tools and files
const context = new MockAgentContext({
  tools: [{
    name: 'search',
    handler: async (input) => ({
      success: true,
      output: { results: ['item1', 'item2'] },
    }),
  }],
  files: {
    'config.json': '{"api_key": "test"}',
    'data/input.txt': 'line1\nline2\nline3',
  },
  memory: { session: 'abc123' },
  env: { NODE_ENV: 'test' },
});

// Run tests
const executor: SkillExecutor = async (inputs, ctx) => {
  // Your skill logic here
  return { result: 'processed' };
};

const results = await runContract(executor, contract, {
  contextConfig: { /* same as MockAgentContext config */ },
  measureTokens: true,
  tagFilter: ['smoke'],
});

// Format output
console.log(formatResultsText(results));
console.log(formatResultsJson(results));
console.log(formatResultsMarkdown(results));

// Idempotency check
const idempotency = await checkIdempotency(executor, contract.scenarios[0], {
  runs: 5,
});
console.log(`Deterministic: ${idempotency.isDeterministic}`);

// Token measurement
const measurement = await measureTokens(executor, contract.scenarios[0]);
console.log(`Total tokens: ${measurement.usage.totalTokens}`);

Mock Agent Context

The MockAgentContext provides a complete simulated agent environment:

Method	Description
`callTool(name, input)`	Call a registered tool
`readFile(path)`	Read from virtual filesystem
`writeFile(path, content)`	Write to virtual filesystem
`deleteFile(path)`	Delete from virtual filesystem
`fileExists(path)`	Check file existence
`listFiles(dir)`	List directory contents
`getMemory(key)`	Read from memory store
`setMemory(key, value)`	Write to memory store
`getEnv(key)`	Read environment variable
`setEnv(key, value)`	Set environment variable
`deleteEnv(key)`	Delete environment variable
`fetch(url, options)`	Make HTTP request (mocked)
`spawn(cmd, args)`	Spawn process (mocked)

Every operation is tracked and available via getTrackedSideEffects().

Comparison

Feature	skilltest	mcptest	Manual Testing
Focus	Skill business logic	MCP protocol compliance	Ad-hoc
Contract-driven	YAML contracts	Protocol spec	None
Mock environment	Full agent context	Transport layer	Real environment
Side effect tracking	Automatic	N/A	Manual inspection
Idempotency	Built-in N-run check	N/A	Manual
Token measurement	Built-in estimation	N/A	External tools
Schema validation	JSON Schema (Ajv)	MCP schema	Manual
Comparative testing	Multiple skills vs same contract	N/A	Spreadsheets
Output formats	Text, JSON, Markdown	JSON	None
CI-friendly	Exit codes, JSON output	Exit codes	Scripts

Side Effect Types

Type	Description	Tracked From
`file_write`	File creation/modification	`writeFile()`
`file_read`	File reading	`readFile()`
`file_delete`	File deletion	`deleteFile()`
`network_request`	HTTP/API calls	`fetch()`
`env_mutation`	Environment variable changes	`setEnv()`, `deleteEnv()`
`process_spawn`	External process execution	`spawn()`
`memory_write`	Memory/state mutations	`setMemory()`
`tool_call`	Tool invocations	`callTool()`

Constraint Types

Type	Value	Description
`max_tokens`	number	Maximum total token usage
`max_duration_ms`	number	Maximum execution time
`max_file_writes`	number	Maximum file write operations
`max_network_calls`	number	Maximum network requests
`requires_tool`	string	Must use this specific tool
`no_side_effects`	boolean	Disallow any mutations
`deterministic`	boolean	Same inputs must produce same outputs

Architecture

src/
  contract/       Contract parsing (YAML/JSON) and validation
  context/        Mock agent context with tracked side effects
  schema/         JSON Schema validation via Ajv
  effects/        Side effect tracking and constraint validation
  idempotency/    Determinism checking (N-run comparison)
  token/          Token usage estimation and measurement
  runner/         Core execution engine
  report/         Output formatting (text, JSON, markdown)
  cli/            CLI command implementations
  utils/          Path handling, deep equality, error types

Requirements

Node.js 18+
Zero heavy dependencies (Ajv, Commander, js-yaml only)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ROUND_LOG.md		ROUND_LOG.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

skilltest

Why This Exists

Quick Start

Install

1. Create a skill contract

2. Write your skill executor

3. Run tests

Contract Format

Full Contract Reference

CLI Reference

`skilltest run`

`skilltest validate`

`skilltest report`

`skilltest init`

Programmatic API

Mock Agent Context

Comparison

Side Effect Types

Constraint Types

Architecture

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

skilltest

Why This Exists

Quick Start

Install

1. Create a skill contract

2. Write your skill executor

3. Run tests

Contract Format

Full Contract Reference

CLI Reference

skilltest run

skilltest validate

skilltest report

skilltest init

Programmatic API

Mock Agent Context

Comparison

Side Effect Types

Constraint Types

Architecture

Requirements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`skilltest run`

`skilltest validate`

`skilltest report`

`skilltest init`

Packages