Skip to content

collection.md: auto-generation, maturity safety, and custom header support #1256

@katriendg

Description

@katriendg

Context

This issue was raised during review of PR #1207, triggered by a question about why collections/hve-core-all.collection.md was missing the newly added skill, and a secondary observation that the .collection.md files are not auto-generated despite the plugin README files being auto-generated.

Cited discussion comments: #1207 (comment)


Current Behavior (by design, but worth revisiting)

The three files involved in producing each published surface (plugin README and extension marketplace README) are:

File Authored by Used for
collections/{id}.collection.yml Contributor (manual) Structured metadata, item list with maturity, notice: field
collections/{id}.collection.md Contributor (manual) Prose body: overview paragraph, feature/agent/skill bullet lists
plugins/{id}/README.md Auto-generated by npm run plugin:generate Copilot CLI marketplace page
extension/README.{id}.md Auto-generated by Prepare-Extension.ps1 VS Code Marketplace page

Generation flow:

  • plugins/{id}/README.md is fully auto-generated: it takes the yml name/description/notice fields for the header, inserts the entire collection.md content as an ## Overview section, then appends auto-built artifact tables (Agents, Commands, Instructions, Skills) from the yml items list.
  • extension/README.{id}.md follows the same pattern using README.template.md tokens: {{BODY}} = full collection.md, {{ARTIFACTS}} = auto-built artifact tables from yml items.
  • The hve-core-all.collection.yml is auto-populated by Update-HveCoreAllCollection (scanning all other collections), but hve-core-all.collection.md is not auto-updated.

Root cause of the PR confusion: When a contributor adds a new artifact to collection.yml, the artifact tables in both generated READMEs update automatically. However, the named bullet lists inside collection.md (e.g., - **OWASP MCP Top 10** — ...) are hand-authored and silently fall out of sync. There is no linting or validation step that detects this gap.


Problem Areas

1. collection.md prose body is not auto-generated

The bullet list sections in collection.md (listing agents, subagents, skills with short descriptions) largely duplicate data already present as YAML frontmatter description: fields on each artifact. When a contributor adds a new artifact to collection.yml and does not update collection.md, the prose body becomes stale.

The ambiguity is worsened because Generate-Plugins.ps1 does auto-update hve-core-all.collection.yml with newly discovered artifacts, creating a justified expectation that the companion .md would also update. It does not.

2. Extension README artifact tables do not filter by maturity channel

New-CollectionReadme in Prepare-Extension.ps1 iterates all items from the collection manifest without applying the Channel maturity filter. The function call site (around line 320) also does not pass the active channel. This means the stable extension's marketplace README (extension/README.{id}.md) lists artifact names from experimental and preview items alongside stable ones.

The package.json artifact registration does filter by maturity (via Get-CollectionArtifacts with AllowedMaturities), so the agent picker in stable VS Code correctly omits experimental agents — but the README visible to users on the marketplace says otherwise.

3. collection.md body prose may name experimental artifacts

Because collection.md is hand-authored prose that is inserted verbatim into both the plugin README and extension README, any artifact names mentioned in the prose (e.g., - **Security Planner** — ...) appear in the stable extension's marketplace page. There is no mechanism to conditionally exclude those names from the stable surface.

4. No custom header/intro slot separate from the notice

The yml notice: field serves callout blocks (CAUTION, WARNING, etc.), but there is no supported way to add per-collection introductory text (e.g., a one-paragraph why/when of the collection) that appears before the prose body and is distinct from the warning-style notice.


Options to Evaluate

Option A — Auto-generate collection.md entirely from yml + artifact frontmatter

Generate the entire collection.md prose from data already in collection.yml (item paths, kinds) plus description: fields read from each artifact's YAML frontmatter. The script writes collection.md as a generated file.

  • Pro: Single source of truth; always in sync; removes contributor burden.
  • Con: Loses ability to write custom non-tabular prose (e.g., narrative context). All prose must come from artifact description: fields, which are intentionally terse one-liners. The generated bullet sections would closely mirror the artifact tables already present below them, adding little value.
  • Con: hve-core-all.collection.md has deliberate curated prose introducing the mega-collection that cannot be derived from yml alone.
  • Verdict: Suitable only if the body is intentionally reduced to structured data; doesn't map well to the current narrative style.

Option B — Add header: YAML field to collection.yml; keep collection.md for body prose

Add a multiline header: field to collection.yml (analogous to notice:). Scripts inject it before the collection.md body. The body prose sections (feature lists, agent descriptions) remain hand-authored in collection.md, but are validated.

  • Pro: Explicit, structured header with version control; easy to see in yml review.
  • Con: Introduces a second prose field in yml alongside the existing notice:. For collections that don't need a header, the field is just omitted. The core sync problem (body prose vs. yml items) is not addressed.

Option C — Template sections within collection.md (recommended for discussion)

Give collection.md an optional YAML frontmatter block that carries a header: string. Below the frontmatter, the body is partitioned into:

  1. A hand-authored <!-- hve:intro --> block (intro paragraph, preserved as-is).
  2. Auto-generated <!-- hve:agents -->, <!-- hve:skills --> etc. blocks that Generate-Plugins.ps1 regenerates from yml items + each artifact's description: frontmatter.

The script would regenerate only the tagged blocks, leaving the intro unchanged. The plugin and extension READMEs pull the full collection.md as before.

  • Pro: Keeps customizable intro prose; auto-syncs the artifact bullet lists; container tags make it machine-readable.
  • Con: More complex script logic. Contributors must not edit inside auto-generated blocks. Tag convention needs documentation.

Option D — Move collection.md body to yml (body: multiline field); auto-generate the artifact lists appended into it

All prose lives in collection.yml as body: multiline text. Artifact bullet sections are appended automatically by the script at generation time. collection.md becomes a generated file and is removed from contributor ownership.

  • Pro: One file to edit (yml). The generator always produces a complete, correct collection.md.
  • Con: Multiline YAML prose is harder to write/review than a markdown file. Loss of markdown preview ergonomics.

Immediate Bug to Fix (independent of design choice)

Regardless of which option is chosen for the auto-generation design, New-CollectionReadme in Prepare-Extension.ps1 should receive and apply the active Channel maturity filter when building artifact tables. The $Channel parameter is already available in the calling context and Get-CollectionArtifacts already implements the correct filtering logic. Passing the filtered collection (or filtered artifact lists) into New-CollectionReadme would close the stable-extension-lists-experimental-agents gap.


Questions for Team Discussion

  1. Is auto-generation of collection.md body desirable, or is the hand-authored narrative a deliberate quality bar we want to maintain?
  2. If auto-generation is adopted (Option C or D), should -Refresh on Generate-Plugins.ps1 also regenerate collection.md blocks, or should there be a separate npm run collections:sync command?
  3. For the header: slot: should it live in yml (Option B) or in collection.md frontmatter (Option C)? The yml location is simpler and follows the precedent of notice:.
  4. Should the stable extension README for security/experimental collections explicitly state which items require the Pre-Release channel, rather than just omitting them from the listing?

Related

  • PR feat(skill): introduce owasp-mcp #1207 — context for this issue
  • scripts/plugins/Modules/PluginHelpers.psm1New-PluginReadmeContent, plugin README generation
  • scripts/extension/Prepare-Extension.ps1New-CollectionReadme, extension README generation + Get-CollectionArtifacts (maturity filtering already implemented for package.json but not README)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions