fix: prevent zombie MCP processes via handshake timeout + deferred init#89
Conversation
Greptile SummaryThis PR fixes the recurring zombie MCP process bug by adding a 30-second handshake timeout that exits the process if no MCP Key changes:
Confidence Score: 5/5Safe to merge — the zombie guard logic is correct and well-tested; all findings are P2 style/maintenance suggestions The core handshake-timeout implementation is sound: unref() is used correctly, oninitialized defers all heavy work, and four integration tests exercise the real binary. The discovery harness is comprehensive and its gate logic is correct for the current frozen protocol. Three P2 findings (lowerIsBetter inconsistency, missing undefined-runner guard, and build-dependency test guard) are non-blocking quality improvements that do not affect correctness at the current protocol version. src/eval/discovery-harness.ts deserves a follow-up to align lowerIsBetter across compareMetric and compareMetricWithinTolerance, and to add a guard for unregistered surface runners
|
| Filename | Overview |
|---|---|
| src/index.ts | Adds handshake timeout (30 s default, env-configurable) and defers all project init to oninitialized callback to prevent zombie MCP processes |
| tests/zombie-guard.test.ts | Integration tests spawning real child processes to verify exit-code-1 behavior, timeout message, deferred init, and custom timeout env var |
| src/eval/discovery-harness.ts | New discovery harness with surface runners, task scoring, gate evaluation, and report formatting; minor lowerIsBetter inconsistency with compareMetric |
| src/eval/types.ts | New discovery eval types: DiscoveryTask, DiscoveryFixture, DiscoverySummary, gate types, and benchmark protocol interfaces |
| tests/discovery-harness.test.ts | Comprehensive unit tests for discovery harness scoring, fixture freezing, report formatting, and gate evaluation edge cases |
| tests/eval-harness.test.ts | Imports countUtf8Bytes and estimateTokenCountFromBytes as shared exports; existing harness tests unchanged |
| scripts/run-eval.mjs | Extends eval CLI with --mode flag for retrieval |
| src/eval/harness.ts | Extracts countUtf8Bytes and estimateTokenCountFromBytes as public exports for reuse by the discovery harness |
| docs/capabilities.md | Updates eval docs to reflect discovery mode, new fixtures, metrics, and gate semantics |
| tests/fixtures/discovery-benchmark-protocol.json | Frozen benchmark protocol with ship gate, 15% comparator tolerance, and named comparator list |
| tests/fixtures/discovery-angular-spotify.json | Frozen discovery fixture for angular-spotify: 12 tasks, balanced 4/4/4 map/find/search split, pinned to commit SHA |
| tests/fixtures/discovery-excalidraw.json | Frozen discovery fixture for excalidraw: 12 tasks, balanced 4/4/4 map/find/search split, pinned to commit SHA |
| src/resources/codebase-intelligence.ts | No functional changes; serves as the surface runner for codebase://context discovery tasks |
| tests/fixtures/README.md | Documentation update describing the new discovery fixtures and their frozen-eval semantics |
Sequence Diagram
sequenceDiagram
participant CLI as npx codebase-context
participant Main as main() in index.ts
participant Timer as HandshakeTimer (30s, unref'd)
participant MCP as MCP Client
participant Init as Project Init
CLI->>Main: start (STDIO mode)
Main->>Timer: setTimeout(exit(1), 30s).unref()
Main->>MCP: server.connect(StdioTransport)
alt Zombie scenario — no client connects
Timer-->>Main: fires after 30s
Main->>CLI: process.exit(1) + stderr message
else Real MCP client connects
MCP->>Main: initialize message
Main->>Timer: clearTimeout(handshakeTimer)
Main->>Init: oninitialized() — index + file-watcher
Init-->>MCP: ready
MCP->>Main: (normal lifecycle)
Main->>CLI: process.exit(0) on close/SIGTERM
end
Reviews (1): Last reviewed commit: "fix: prevent zombie MCP processes via ha..." | Re-trigger Greptile
src/eval/discovery-harness.ts
Outdated
| metric: DiscoveryMetricName, | ||
| tolerancePercent: number | ||
| ): DiscoveryMetricComparison { | ||
| const lowerIsBetter = metric === 'averageFirstRelevantHit'; |
There was a problem hiding this comment.
lowerIsBetter inconsistency with compareMetric
In compareMetric (line 243), both averageEstimatedTokens and averageFirstRelevantHit are treated as lower-is-better:
const lowerIsBetter = metric === 'averageEstimatedTokens' || metric === 'averageFirstRelevantHit';But in compareMetricWithinTolerance only averageFirstRelevantHit is:
const lowerIsBetter = metric === 'averageFirstRelevantHit';This is safe today because the comparator gate's usefulnessMetrics in the frozen protocol only contains averageUsefulness, averageFirstRelevantHit, and bestExampleUsefulnessRate — never averageEstimatedTokens. However, since usefulnessMetrics is typed as DiscoveryMetricName[], nothing prevents a future protocol update from adding averageEstimatedTokens to the comparator gate, which would silently evaluate it as higher-is-better and produce an incorrect gate result.
Consider aligning the two functions:
| const lowerIsBetter = metric === 'averageFirstRelevantHit'; | |
| const lowerIsBetter = metric === 'averageEstimatedTokens' || metric === 'averageFirstRelevantHit'; |
There was a problem hiding this comment.
Fixed. now includes and in its check, consistent with . This was already addressed in the rebase by taking the metric-alignment commit from master (21c3e43 — fix: align discovery protocol metrics), which added averagePayloadBytes to both functions and to DiscoveryMetricName.
| for (const task of fixture.tasks) { | ||
| const runner = runners[task.surface]; | ||
| const payload = await runner(task, rootPath); | ||
| results.push(evaluateDiscoveryTask(task, payload)); |
There was a problem hiding this comment.
No guard for undefined surface runner
If task.surface is a value not present in the runners map (e.g., a typo in a fixture file or a newly added DiscoverySurface value without a corresponding runner), runner is undefined and calling await runner(task, rootPath) throws a confusing TypeError: runner is not a function rather than a descriptive error.
Consider adding an explicit check:
for (const task of fixture.tasks) {
const runner = runners[task.surface];
if (!runner) {
throw new Error(`No runner registered for surface: ${task.surface}`);
}
const payload = await runner(task, rootPath);
results.push(evaluateDiscoveryTask(task, payload));
}There was a problem hiding this comment.
Fixed. Added an explicit guard before calling the runner:
const runner = runners[task.surface];
if (!runner) {
throw new Error(`No runner registered for surface: ${task.surface}`);
}This surfaces fixture typos or unregistered surfaces with a clear message instead of a cryptic TypeError.
| import { fileURLToPath } from 'node:url'; | ||
|
|
||
| const __dirname = path.dirname(fileURLToPath(import.meta.url)); | ||
| const ENTRY_POINT = path.resolve(__dirname, '..', 'dist', 'index.js'); |
There was a problem hiding this comment.
Tests depend on a pre-built
dist/index.js
The test resolves the entry point from dist/index.js, so it will fail with a confusing rejection (from child.on('error', reject)) if the project hasn't been built before running tests. A descriptive beforeAll guard would make the failure more actionable:
import { existsSync } from 'node:fs';
beforeAll(() => {
if (!existsSync(ENTRY_POINT)) {
throw new Error(
`dist/index.js not found — run \`npm run build\` before the zombie-guard tests.`
);
}
});This is especially relevant for contributors running vitest without a prior build step.
There was a problem hiding this comment.
Fixed. Added a beforeAll guard at the top of the describe block:
beforeAll(() => {
if (!existsSync(ENTRY_POINT)) {
throw new Error(
`dist/index.js not found - run \`npm run build\` before the zombie-guard tests.`
);
}
});Throws early with an actionable message rather than letting the test fail with a cryptic spawn rejection.
Adds a 30-second handshake timer that exits the process if no MCP client sends an initialize message within the timeout window. Defers CPU-heavy indexing and file-watcher setup to the oninitialized callback so a misfire consumes near-zero resources before exiting. Closes the recurring zombie process bug where Codex CLI spawns \ px codebase-context <path>\ without a subcommand, starting the MCP server with no client to connect. The 4 previous guards (ppid poller, stdin close, server.onclose, SIGHUP) all failed because cmd.exe holds the pipe open while waiting for the child to produce output. The fix is tool-agnostic: it does not rely on who is calling it or how. Any invocation that does not result in an MCP initialize handshake within 30 seconds exits cleanly with code 1.
0344cc1 to
dfcbbce
Compare
Adds a 30-second handshake timer that exits the process if no MCP client
sends an initialize message within the timeout window. Defers CPU-heavy
indexing and file-watcher setup to the oninitialized callback so a misfire
consumes near-zero resources before exiting.
Closes the recurring zombie process bug where Codex CLI spawns
px codebase-context \ without a subcommand, starting the MCP server
with no client to connect. The 4 previous guards (ppid poller, stdin close,
server.onclose, SIGHUP) all failed because cmd.exe holds the pipe open
while waiting for the child to produce output.
The fix is tool-agnostic: it does not rely on who is calling it or how.
Any invocation that does not result in an MCP initialize handshake within
30 seconds exits cleanly with code 1.
What / Why
Explain the change briefly.
Release notes
This repo uses an automated release flow.
feat: ...,fix: ...,docs: ...,chore: ...,refactor: ...