Skip to content

JSLEEKR/skilltest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

skilltest

Testing harness for AI agent skills and plugins

npm version License: MIT Tests: 332 TypeScript Node.js

Define skill contracts in YAML. Test inputs, outputs, side effects, idempotency, and token overhead in isolation.


Why This Exists

AI agent skills and plugins are hard to test. They interact with tools, read/write files, call APIs, mutate memory, and produce non-deterministic outputs. Manual testing is slow, incomplete, and doesn't scale.

skilltest solves this with a contract-driven testing approach:

  1. Define what your skill does (inputs, outputs, side effects) in a YAML contract
  2. Mock the agent environment (tools, filesystem, memory, network) so tests run in isolation
  3. Validate outputs against JSON Schema, track every side effect, enforce constraints
  4. Repeat the same scenario multiple times to catch non-determinism
  5. Compare multiple skill implementations against the same contract

This is not MCP protocol testing (use mcptest for that). This is business logic testing -- does your skill produce the right outputs for given inputs?

Quick Start

Install

npm install skilltest

1. Create a skill contract

npx skilltest init my-skill

This creates my-skill.skill.yaml:

name: "my-skill"
version: "1.0.0"
description: "TODO: describe what my-skill does"

inputs:
  - name: query
    type: string
    description: "The input query to process"
    required: true

outputs:
  - name: result
    type: object
    description: "The processed result"

scenarios:
  - name: basic-query
    inputs:
      query: "hello world"
    expectedOutputs:
      result:
        processed: true
        data: "hello world"
    tags:
      - smoke

2. Write your skill executor

import type { SkillExecutor } from 'skilltest';

export const execute: SkillExecutor = async (inputs, context) => {
  const query = inputs.query as string;

  // Use tools from the agent context
  const searchResult = await context.callTool('search', { query });

  // Read/write files
  const config = await context.readFile('config.json');

  // Access memory
  context.setMemory('lastQuery', query);

  return {
    result: {
      processed: true,
      data: query,
    },
  };
};

3. Run tests

npx skilltest run -c my-skill.skill.yaml -e ./my-executor.js

Output:

  skilltest results: 1 scenarios
  ========================================

  [PASS] my-skill / basic-query (12ms)

  ----------------------------------------
  Total: 1 | Passed: 1 | Failed: 0
  Status: PASSED

Contract Format

Skill contracts are YAML or JSON files that define the complete testing specification.

Full Contract Reference

# Required fields
name: "skill-name"
version: "1.0.0"
description: "What this skill does"

# Input definitions with optional JSON Schema
inputs:
  - name: query
    type: string           # string, number, integer, boolean, array, object, null, any
    description: "The search query"
    required: true          # default: true
    schema:                 # optional: full JSON Schema for deep validation
      type: string
      minLength: 1
      maxLength: 10000
  - name: maxResults
    type: integer
    required: false
    default: 10            # used when input not provided

# Output definitions with optional JSON Schema
outputs:
  - name: results
    type: array
    description: "Search results"
    schema:
      type: array
      items:
        type: object
        properties:
          title: { type: string }
          score: { type: number }

# Declared side effects (undeclared effects = test failure)
sideEffects:
  - type: tool_call        # tool_call, file_write, file_read, file_delete,
    description: "Calls the search tool"  # network_request, env_mutation,
    idempotent: true        # process_spawn, memory_write
  - type: file_write
    description: "Caches results"
    target: "cache/*"       # glob pattern for allowed targets

# Constraints
constraints:
  - type: max_tokens        # max token budget
    value: 5000
  - type: max_duration_ms   # execution time limit
    value: 10000
  - type: max_file_writes   # file write count limit
    value: 3
  - type: max_network_calls # network call count limit
    value: 5
  - type: requires_tool     # must use this tool
    value: "search"
  - type: no_side_effects   # pure function, no writes/mutations
    value: true
  - type: deterministic     # same input = same output (checked via idempotency)
    value: true

# Test scenarios
scenarios:
  - name: basic-search
    description: "Simple search query"
    inputs:
      query: "hello world"
      maxResults: 5
    expectedOutputs:        # exact value matching
      results:
        - title: "Hello World Guide"
          score: 0.95
    expectedSideEffects:    # verify specific effects occurred
      - type: tool_call
        target: "search"
        count: 1
    tags:                   # for filtering with --tag
      - smoke
      - search

  - name: empty-results
    inputs:
      query: "nonexistent"
    expectedOutputs:
      results: []
    tags:
      - edge-case

  - name: invalid-input
    inputs:
      query: null
    shouldFail: true        # test passes if execution throws
    tags:
      - error

# Optional metadata
metadata:
  author: "your-name"
  repository: "your/repo"

CLI Reference

skilltest run

Run skill tests against a contract.

skilltest run -c contract.yaml [options]

Options:
  -c, --contract <path>    Path to skill contract (required)
  -e, --executor <path>    Path to executor module (default: pass-through)
  -s, --scenario <name>    Run only a specific scenario
  -t, --tag <tags...>      Filter scenarios by tags
  --idempotency <runs>     Run idempotency check with N runs
  --tokens                 Measure token usage
  -o, --output <path>      Write report to file
  -f, --format <format>    Output format: text, json, markdown (default: text)
  --timeout <ms>           Execution timeout in milliseconds
  -v, --verbose            Verbose output with token details

Examples:

# Run all scenarios
skilltest run -c my-skill.yaml -e ./executor.js

# Run only smoke tests
skilltest run -c my-skill.yaml -e ./executor.js -t smoke

# Check idempotency with 5 runs
skilltest run -c my-skill.yaml -e ./executor.js --idempotency 5

# JSON output with token measurement
skilltest run -c my-skill.yaml -e ./executor.js --tokens -f json -o results.json

# Markdown report
skilltest run -c my-skill.yaml -e ./executor.js -f markdown -o report.md

skilltest validate

Validate a contract file for correctness without running tests.

skilltest validate -c contract.yaml [options]

Options:
  -c, --contract <path>    Path to skill contract (required)
  --strict                 Enable strict validation (warnings become errors)

Checks performed:

  • YAML/JSON syntax
  • Required fields present
  • Input/output name uniqueness
  • Scenario coverage of required inputs
  • ExpectedOutput keys match declared outputs
  • Constraint compatibility (e.g., no_side_effects vs max_file_writes)
  • Side effect declaration consistency
  • Strict mode: descriptions, version format, expected outputs

skilltest report

Aggregate and format results from JSON output files.

skilltest report <files...> [options]

Options:
  -o, --output <path>      Write report to file
  -f, --format <format>    Output format: text, json, markdown

skilltest init

Create a new contract from a template.

skilltest init <name> [options]

Options:
  -o, --output <path>      Output file path
  -t, --template <type>    Template: basic, advanced, minimal (default: basic)

Templates:

  • minimal -- Bare minimum contract with one scenario
  • basic -- Standard contract with 3 scenarios, side effects, constraints
  • advanced -- Full-featured contract with JSON Schema, tags, metadata, Unicode test

Programmatic API

Use skilltest as a library for custom test runners.

import {
  parseContractString,
  validateContract,
  MockAgentContext,
  runContract,
  runScenario,
  checkIdempotency,
  measureTokens,
  formatResultsText,
  formatResultsJson,
  formatResultsMarkdown,
} from 'skilltest';
import type { SkillExecutor, SkillContract } from 'skilltest';

// Parse and validate a contract
const contract = parseContractString(yamlContent);
const validation = validateContract(contract, true); // strict mode

// Create a mock context with tools and files
const context = new MockAgentContext({
  tools: [{
    name: 'search',
    handler: async (input) => ({
      success: true,
      output: { results: ['item1', 'item2'] },
    }),
  }],
  files: {
    'config.json': '{"api_key": "test"}',
    'data/input.txt': 'line1\nline2\nline3',
  },
  memory: { session: 'abc123' },
  env: { NODE_ENV: 'test' },
});

// Run tests
const executor: SkillExecutor = async (inputs, ctx) => {
  // Your skill logic here
  return { result: 'processed' };
};

const results = await runContract(executor, contract, {
  contextConfig: { /* same as MockAgentContext config */ },
  measureTokens: true,
  tagFilter: ['smoke'],
});

// Format output
console.log(formatResultsText(results));
console.log(formatResultsJson(results));
console.log(formatResultsMarkdown(results));

// Idempotency check
const idempotency = await checkIdempotency(executor, contract.scenarios[0], {
  runs: 5,
});
console.log(`Deterministic: ${idempotency.isDeterministic}`);

// Token measurement
const measurement = await measureTokens(executor, contract.scenarios[0]);
console.log(`Total tokens: ${measurement.usage.totalTokens}`);

Mock Agent Context

The MockAgentContext provides a complete simulated agent environment:

Method Description
callTool(name, input) Call a registered tool
readFile(path) Read from virtual filesystem
writeFile(path, content) Write to virtual filesystem
deleteFile(path) Delete from virtual filesystem
fileExists(path) Check file existence
listFiles(dir) List directory contents
getMemory(key) Read from memory store
setMemory(key, value) Write to memory store
getEnv(key) Read environment variable
setEnv(key, value) Set environment variable
deleteEnv(key) Delete environment variable
fetch(url, options) Make HTTP request (mocked)
spawn(cmd, args) Spawn process (mocked)

Every operation is tracked and available via getTrackedSideEffects().

Comparison

Feature skilltest mcptest Manual Testing
Focus Skill business logic MCP protocol compliance Ad-hoc
Contract-driven YAML contracts Protocol spec None
Mock environment Full agent context Transport layer Real environment
Side effect tracking Automatic N/A Manual inspection
Idempotency Built-in N-run check N/A Manual
Token measurement Built-in estimation N/A External tools
Schema validation JSON Schema (Ajv) MCP schema Manual
Comparative testing Multiple skills vs same contract N/A Spreadsheets
Output formats Text, JSON, Markdown JSON None
CI-friendly Exit codes, JSON output Exit codes Scripts

Side Effect Types

Type Description Tracked From
file_write File creation/modification writeFile()
file_read File reading readFile()
file_delete File deletion deleteFile()
network_request HTTP/API calls fetch()
env_mutation Environment variable changes setEnv(), deleteEnv()
process_spawn External process execution spawn()
memory_write Memory/state mutations setMemory()
tool_call Tool invocations callTool()

Constraint Types

Type Value Description
max_tokens number Maximum total token usage
max_duration_ms number Maximum execution time
max_file_writes number Maximum file write operations
max_network_calls number Maximum network requests
requires_tool string Must use this specific tool
no_side_effects boolean Disallow any mutations
deterministic boolean Same inputs must produce same outputs

Architecture

src/
  contract/       Contract parsing (YAML/JSON) and validation
  context/        Mock agent context with tracked side effects
  schema/         JSON Schema validation via Ajv
  effects/        Side effect tracking and constraint validation
  idempotency/    Determinism checking (N-run comparison)
  token/          Token usage estimation and measurement
  runner/         Core execution engine
  report/         Output formatting (text, JSON, markdown)
  cli/            CLI command implementations
  utils/          Path handling, deep equality, error types

Requirements

  • Node.js 18+
  • Zero heavy dependencies (Ajv, Commander, js-yaml only)

License

MIT

Packages

 
 
 

Contributors