Skip to content

feat(graph): add graph check command for pre-rebuild conflict detection#999

Open
stmcgovern wants to merge 3 commits intopython-wheel-build:mainfrom
stmcgovern:graph-validate-v1
Open

feat(graph): add graph check command for pre-rebuild conflict detection#999
stmcgovern wants to merge 3 commits intopython-wheel-build:mainfrom
stmcgovern:graph-validate-v1

Conversation

@stmcgovern
Copy link
Copy Markdown

@stmcgovern stmcgovern commented Mar 31, 2026

Pull Request Description

What

Add a pre-rebuild check that answers two questions in seconds:

  1. Is this graph structurally sound? (well-formed, acyclic)
  2. How many wheels are extra? (collapsible vs required)

The conflict classification uses the same specifier.filter logic as show_explain_duplicates and write_constraints_file. It is conservative: collapsible here guarantees write_constraints_file succeeds.

Self-loops (e.g. safetensors[numpy]) are reported as warnings, not failures — they are common in extras and harmless for install deps. Real cycles (e.g. haystack-ai ↔ haystack-experimental) cause failure.

Supports --json for CI integration and --constraints for pin output.

Why

Closes: #998

…tion

Add a pre-rebuild check that answers two questions in seconds:

1. Is this graph structurally sound? (well-formed, acyclic)
2. How many wheels are extra? (collapsible vs required)

The conflict classification uses the same specifier.filter logic as
show_explain_duplicates and write_constraints_file. It is conservative:
collapsible here guarantees write_constraints_file succeeds.

Self-loops (e.g. safetensors[numpy]) are reported as warnings, not
failures — they are common in extras and harmless for install deps.
Real cycles (e.g. haystack-ai ↔ haystack-experimental) cause failure.

Supports --json for CI integration and --constraints for pin output.

Closes: python-wheel-build#998
@stmcgovern stmcgovern requested a review from a team as a code owner March 31, 2026 15:04
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

This change introduces a new graph check CLI subcommand under fromager graph that performs pre-rebuild validation of dependency graphs stored in graph.json. The implementation adds three structural checks: well-formedness (detects dangling edges missing from the graph), acyclicity (identifies cycles and self-loops via depth-first search), and version-uniqueness (flags multiple versions of packages). For packages with multiple versions, it classifies conflicts as "collapsible" (a single version satisfies all consumer specifiers) or "required" (no single version satisfies all consumers). The command supports human-readable output, JSON, and constraints-format output modes, and exits with code 1 on detected issues. Accompanying test coverage includes unit tests for each validation function and CLI integration tests validating output formats, exit codes, and edge cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: adding a graph check command for pre-rebuild conflict detection, which matches the PR's core objective.
Description check ✅ Passed The description clearly explains the purpose, functionality, and output modes of the new command, aligned with the changeset's objectives.
Linked Issues check ✅ Passed The code implements all required objectives from #998: well-formedness checks for dangling edges, acyclicity detection with self-loop/cycle distinction, version-uniqueness classification (collapsible vs required), multiple output modes (human-readable, --json, --constraints), and CI-friendly exit codes.
Out of Scope Changes check ✅ Passed All changes are directly related to the graph check command: new CLI subcommand implementation and comprehensive test coverage. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify mergify bot added the ci label Mar 31, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/fromager/commands/graph.py (2)

1075-1100: Consider making --json and --constraints mutually exclusive.

If both flags are specified, --constraints silently takes precedence. This could surprise users.

Option: Add a click mutex constraint
`@click.option`(
    "--json",
    "as_json",
    is_flag=True,
    default=False,
    help="Output results as JSON.",
)
`@click.option`(
    "--constraints",
    "as_constraints",
    is_flag=True,
    default=False,
    help="Output collapsible pins in constraints.txt format.",
)

You could add validation at the start of check():

if as_json and as_constraints:
    raise click.UsageError("--json and --constraints are mutually exclusive")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fromager/commands/graph.py` around lines 1075 - 1100, The check() command
accepts both --json (as_json) and --constraints (as_constraints) but currently
lets --constraints silently win; add an explicit mutual-exclusion check at the
start of the check() function to detect if both as_json and as_constraints are
True and raise a click.UsageError (with a clear message like "--json and
--constraints are mutually exclusive"). This change should be implemented inside
the check() function body that receives wkctx, graph_file, as_json,
as_constraints so any callers get an immediate, user-facing error instead of
surprising behavior.

1113-1129: Avoid reading the graph file twice.

The graph JSON is loaded at line 1115, then re-read at line 1127 via from_file. Use from_dict(graph_dict) instead.

Proposed fix
     if not wf_issues:
-        graph = DependencyGraph.from_file(graph_file)
+        graph = DependencyGraph.from_dict(graph_dict)
         entries = _classify_conflicts(graph)
         n_nodes = len(graph) - 1  # exclude root
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fromager/commands/graph.py` around lines 1113 - 1129, The code currently
loads graph_file into graph_dict and later re-reads the file via
DependencyGraph.from_file; replace that second read with
DependencyGraph.from_dict(graph_dict) to avoid double I/O: after verifying
wf_issues and acyclicity (using _check_well_formed and _check_acyclicity), call
DependencyGraph.from_dict(graph_dict) (not from_file) to build the graph used by
_classify_conflicts and to compute n_nodes, keeping the existing logic for
entries = _classify_conflicts(graph) and n_nodes = len(graph) - 1.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/fromager/commands/graph.py`:
- Around line 1075-1100: The check() command accepts both --json (as_json) and
--constraints (as_constraints) but currently lets --constraints silently win;
add an explicit mutual-exclusion check at the start of the check() function to
detect if both as_json and as_constraints are True and raise a click.UsageError
(with a clear message like "--json and --constraints are mutually exclusive").
This change should be implemented inside the check() function body that receives
wkctx, graph_file, as_json, as_constraints so any callers get an immediate,
user-facing error instead of surprising behavior.
- Around line 1113-1129: The code currently loads graph_file into graph_dict and
later re-reads the file via DependencyGraph.from_file; replace that second read
with DependencyGraph.from_dict(graph_dict) to avoid double I/O: after verifying
wf_issues and acyclicity (using _check_well_formed and _check_acyclicity), call
DependencyGraph.from_dict(graph_dict) (not from_file) to build the graph used by
_classify_conflicts and to compute n_nodes, keeping the existing logic for
entries = _classify_conflicts(graph) and n_nodes = len(graph) - 1.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 59054cbc-aa40-420f-b03f-7f71ab591faa

📥 Commits

Reviewing files that changed from the base of the PR and between aec9c9c and f5396e0.

📒 Files selected for processing (2)
  • src/fromager/commands/graph.py
  • tests/test_graph_validate.py

…e file read

Address review feedback: raise UsageError when both output flags are
passed, and use from_dict(graph_dict) instead of re-reading the file.
Align test filename with the command name per CONTRIBUTING.md convention.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_graph_check.py (1)

347-360: Consider adding assertion for build_efficiency field.

The JSON output includes build_efficiency with total_wheels and extra_builds per the implementation, but no test verifies this field. Minor gap.

assert "build_efficiency" in data
assert "total_wheels" in data["build_efficiency"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_graph_check.py` around lines 347 - 360, The test_clean_graph_json
test is missing assertions for the build_efficiency section of the output;
update the test (in test_clean_graph_json) to assert that "build_efficiency"
exists in the parsed data dict and that it contains the expected keys (e.g.,
"total_wheels" and "extra_builds") and optionally that those values are
present/not-None (use the local variable data to check keys and simple
truthiness or type). Ensure assertions reference
data["build_efficiency"]["total_wheels"] and
data["build_efficiency"]["extra_builds"] so the test will fail if those fields
are absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_graph_check.py`:
- Around line 347-360: The test_clean_graph_json test is missing assertions for
the build_efficiency section of the output; update the test (in
test_clean_graph_json) to assert that "build_efficiency" exists in the parsed
data dict and that it contains the expected keys (e.g., "total_wheels" and
"extra_builds") and optionally that those values are present/not-None (use the
local variable data to check keys and simple truthiness or type). Ensure
assertions reference data["build_efficiency"]["total_wheels"] and
data["build_efficiency"]["extra_builds"] so the test will fail if those fields
are absent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 923cb1b7-9918-481e-aa54-06d541398075

📥 Commits

Reviewing files that changed from the base of the PR and between 926e0f6 and 8679050.

📒 Files selected for processing (1)
  • tests/test_graph_check.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFE] Add graph check command for pre-rebuild conflict detection

1 participant