Graph node follow-ups: repr, containment, empty(), registry docs by Andy-Jost · Pull Request #1859 · NVIDIA/cuda-python

Andy-Jost · 2026-04-03T20:15:13Z

Summary

Follow-on changes addressing review feedback from #1850 and #1853.

Changes

__repr__ improvements (_subclasses.pyx): All GraphNode subclass __repr__ methods now start with handle=0x... followed by static, type-specific identity data, enhancing their usefulness for debugging.
Cheap containment test (_adjacency_set_proxy.pyx): Added _AdjacencySetCore.contains() that checks membership by comparing raw CUgraphNode handles at the C level, avoiding Python object construction. Uses a 16-element stack buffer for a single driver call in the common case.
Stack-buffer optimization in query() (_adjacency_set_proxy.pyx): Same 16-element stack buffer approach applied to query(), eliminating the count-then-fetch two-call pattern for small adjacency sets.
Single-call optimization in nodes() and edges() (_graph_def.pyx): Pre-allocate vectors to 128 entries and pass them on the first driver call. Only fall back to a second call if the graph exceeds 128 nodes/edges.
Early type check in update() (_adjacency_set_proxy.pyx): Type validation moved inline to fail fast on invalid input, per review feedback.
GraphDef.empty() (_graph_def.pyx): New method for creating entry-point empty nodes without the awkward g.join() spelling.
Registry cleanup test (test_graphdef.py): test_registry_cleanup exercises destroy(), graph deletion, and weak-reference cleanup of the node registry.
Registry design documentation (_cpp/REGISTRY_DESIGN.md): Documents the two-level identity map (C++ HandleRegistry and Cython _node_registry) with cross-references at each instantiation site.

Test Coverage

Existing graph tests validate empty() (replacing prior g.join() calls).
test_registry_cleanup covers explicit destroy, graph teardown, and GC-driven cleanup.
__repr__ patterns validated by test_object_protocols.py.

Rename test files to reflect what they actually test: - test_basic -> test_graph_builder (stream capture tests) - test_conditional -> test_graph_builder_conditional - test_advanced -> test_graph_update (moved child_graph and stream_lifetime tests into test_graph_builder) - test_capture_alloc -> test_graph_memory_resource - test_explicit* -> test_graphdef* Made-with: Cursor

- Extend Graph.update() to accept both GraphBuilder and GraphDef sources - Surface CUgraphExecUpdateResultInfo details on update failure instead of a generic CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE message - Release the GIL during cuGraphExecUpdate via nogil block - Add parametrized happy-path test covering both GraphBuilder and GraphDef - Add error-case tests: unfinished builder, topology mismatch, wrong type Made-with: Cursor

Replace cached tuple-based pred/succ with mutable AdjacencySet backed by direct CUDA driver calls. Add GraphNode.remove() wrapping cuGraphDestroyNode. Made-with: Cursor

…cencies Enable adding/removing edges between graph nodes via AdjacencySet (a MutableSet proxy on GraphNode.pred/succ), node removal via discard(), and property setters for bulk edge replacement. Includes comprehensive mutation and interface tests. Closes part of NVIDIA#1330 (step 2: edge mutation on GraphDef). Made-with: Cursor

Replace inline skipif version check with requires_module(np, "2.1") from the shared test helpers, consistent with other test files. Made-with: Cursor

Made-with: Cursor

Rename class and file to AdjacencySetProxy to clarify write-through semantics. Add bulk-efficient clear(), __isub__(), __ior__() overrides and remove_edges() on the Cython core. Guard GraphNode.discard() against double-destroy via membership check. Filter duplicates in update(). Add error-path tests for wrong types, cross-graph edges, and self-edges. Made-with: Cursor

…INEL Replace discard() with destroy() which calls cuGraphDestroyNode and then zeroes the CUgraphNode resource in the handle box via invalidate_graph_node_handle. This prevents stale memory access on destroyed nodes. Properties (type, pred, succ, handle) degrade gracefully to None/empty for destroyed nodes. Remove the GRAPH_NODE_SENTINEL (0x1) approach in favor of using NULL for both sentinels and destroyed nodes, which is simpler and avoids the risk of passing 0x1 to driver APIs that treat it as a valid pointer. Made-with: Cursor

Nodes retrieved via GraphDef.nodes(), edges(), or pred/succ traversal now return the same Python object that was originally created, enabling identity checks with `is`. A C++ HandleRegistry deduplicates CUgraphNode handles, and a Cython WeakValueDictionary caches the Python wrapper objects. Made-with: Cursor

Made-with: Cursor

…sion Sentinel (entry) nodes use NULL as their CUgraphNode, so caching them under a NULL key caused all sentinels across different graphs to share the same handle. This made nodes built from the wrong graph's entry point, causing CUDA_ERROR_INVALID_VALUE for conditional nodes and hash collisions in equality tests. Made-with: Cursor

When a node is destroyed, the driver may reuse its CUgraphNode pointer for a new node. Without unregistering the old entry, the registry returns a stale handle pointing to the wrong node type and graph. Made-with: Cursor

Made-with: Cursor

Every subclass repr now starts with handle=0x... (the CUgraphNode pointer) followed by type-specific identity/parameter data. Dynamic queries (pred counts, subnode counts) are removed in favor of deterministic, cheap fields. This makes set comparison failures in test output readable when debugging graph mutation tests. Made-with: Cursor

copy-pr-bot · 2026-04-03T20:15:16Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Andy-Jost · 2026-04-03T20:18:13Z

/ok to test

Resolve conflicts: keep _cached() wrappers and _node_cache.pop() from graph-node-identity, keep identity comparison (is not) in test. Made-with: Cursor

Aligns Python-side terminology with the C++ graph_node_registry. Made-with: Cursor

github-actions · 2026-04-03T20:34:56Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1859/
https://nvidia.github.io/cuda-python/pr-preview/pr-1859/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1859/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1859/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

unregister_handle: remove the expired() guard that prevented erasure when the shared_ptr was still alive. This caused stale registry entries after destroy(), leading to CUDA_ERROR_INVALID_VALUE when the driver reused CUgraphNode pointer values. Rename invalidate_graph_node_handle -> invalidate_graph_node for consistency with the rest of the graph node API. Made-with: Cursor

Add _AdjacencySetCore.contains() that checks membership by comparing raw CUgraphNode handles at the C level, avoiding Python object construction. Uses a 16-element stack buffer for a single driver call in the common case. Move the type check in update() inline next to the extend loop so invalid input is rejected immediately. Made-with: Cursor

- Add GraphDef.empty() for creating entry-point empty nodes; replace all no-arg join() calls on GraphDef with empty() in tests. - Optimize _AdjacencySetCore.query() to use a 16-element stack buffer, matching the contains() optimization. - Add test_registry_cleanup exercising destroy(), graph deletion, and weak-reference cleanup of the node registry. Made-with: Cursor

Add REGISTRY_DESIGN.md explaining how the C++ HandleRegistry (Level 1) and Cython _node_registry (Level 2) work together to preserve Python object identity through driver round-trips. Add cross-references at each registry instantiation site. Made-with: Cursor

Made-with: Cursor

Andy-Jost · 2026-04-03T22:33:43Z

/ok to test

Andy-Jost · 2026-04-03T22:36:44Z

cuda_core/cuda/core/_graph/_graph_def/_adjacency_set_proxy.pyx

    cdef list query(self):
        cdef cydriver.CUgraphNode c_node = as_cu(self._h_node)
        if c_node == NULL:
            return []
-        cdef size_t count = 0
+        cdef cydriver.CUgraphNode buf[16]
+        cdef size_t count = 16
+        cdef size_t i
        with nogil:
-            HANDLE_RETURN(self._query_fn(c_node, NULL, &count))
-        if count == 0:
-            return []
+            HANDLE_RETURN(self._query_fn(c_node, buf, &count))
+        if count <= 16:
+            return [GraphNode._create(self._h_graph, buf[i])
+                    for i in range(count)]
        cdef vector[cydriver.CUgraphNode] nodes_vec
        nodes_vec.resize(count)
        with nogil:
            HANDLE_RETURN(self._query_fn(
                c_node, nodes_vec.data(), &count))
        return [GraphNode._create(self._h_graph, nodes_vec[i])
                for i in range(count)]


I wasn't able to return a generator as suggested here because nodes_vec is a stack object and it would lead to a use-after-free unless we define a cdef class for the iterator. I added an optimized contains method that avoids reconstructing Python objects for containment checks.

Andy-Jost · 2026-04-03T22:52:54Z

/ok to test

Pre-allocate vectors to 128 entries and pass them on the first call. Only fall back to a second call if the graph exceeds 128 nodes/edges. Made-with: Cursor

Andy-Jost added 15 commits April 2, 2026 09:43

Add AdjacencySet proxy for pred/succ and GraphNode.remove()

7854b76

Replace cached tuple-based pred/succ with mutable AdjacencySet backed by direct CUDA driver calls. Add GraphNode.remove() wrapping cuGraphDestroyNode. Made-with: Cursor

Use requires_module mark for numpy version checks in mutation tests

aa84e26

Replace inline skipif version check with requires_module(np, "2.1") from the shared test helpers, consistent with other test files. Made-with: Cursor

Fix empty-graph return type: return set() instead of () for nodes/edges

b27dd93

Made-with: Cursor

Purge node cache on destroy to prevent stale identity lookups

7a3dbb4

Made-with: Cursor

Unregister destroyed nodes from C++ graph_node_registry

1b7743d

When a node is destroyed, the driver may reuse its CUgraphNode pointer for a new node. Without unregistering the old entry, the registry returns a stale handle pointing to the wrong node type and graph. Made-with: Cursor

Add dedicated test for node identity preservation through round-trips

84f0b30

Made-with: Cursor

Merge branch 'main' into graph-node-identity

64d6c2d

Andy-Jost added this to the cuda.core v0.7.0 milestone Apr 3, 2026

Andy-Jost added the cuda.core Everything related to the cuda.core module label Apr 3, 2026

Andy-Jost self-assigned this Apr 3, 2026

Andy-Jost added 2 commits April 3, 2026 13:18

Merge branch 'main' into graph-node-identity

a40be9a

Resolve conflicts: keep _cached() wrappers and _node_cache.pop() from graph-node-identity, keep identity comparison (is not) in test. Made-with: Cursor

Rename _node_cache/_cached to _node_registry/_registered

729af49

Aligns Python-side terminology with the C++ graph_node_registry. Made-with: Cursor

Andy-Jost added 7 commits April 3, 2026 13:45

Merge branch 'graph-node-identity' into graph-node-repr

9766e54

Fix import formatting in test_registry_cleanup

8370687

Made-with: Cursor

Merge origin/main into graph-node-repr

36527da

Made-with: Cursor

Andy-Jost changed the title ~~Add handle= to all GraphNode subclass __repr__~~ Graph node follow-ups: repr, containment, empty(), registry docs Apr 3, 2026

Andy-Jost commented Apr 3, 2026

View reviewed changes

Andy-Jost marked this pull request as ready for review April 3, 2026 22:53

Optimize GraphDef.nodes() and edges() to try a single driver call

f779f30

Pre-allocate vectors to 128 entries and pass them on the first call. Only fall back to a second call if the graph exceeds 128 nodes/edges. Made-with: Cursor

Andy-Jost force-pushed the graph-node-repr branch from db481d3 to f779f30 Compare April 3, 2026 23:04

Andy-Jost added enhancement Any code-related improvements P0 High priority - Must do! labels Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph node follow-ups: repr, containment, empty(), registry docs#1859

Graph node follow-ups: repr, containment, empty(), registry docs#1859
Andy-Jost wants to merge 25 commits intoNVIDIA:mainfrom
Andy-Jost:graph-node-repr

Andy-Jost commented Apr 3, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 3, 2026

Uh oh!

Andy-Jost commented Apr 3, 2026

Uh oh!

github-actions bot commented Apr 3, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Andy-Jost commented Apr 3, 2026

Uh oh!

Andy-Jost Apr 3, 2026 •

edited

Loading

Uh oh!

Andy-Jost commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Andy-Jost commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Coverage

Uh oh!

copy-pr-bot bot commented Apr 3, 2026

Uh oh!

Andy-Jost commented Apr 3, 2026

Uh oh!

github-actions bot commented Apr 3, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Andy-Jost commented Apr 3, 2026

Uh oh!

Andy-Jost Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andy-Jost commented Apr 3, 2026 •

edited

Loading

Andy-Jost Apr 3, 2026 •

edited

Loading