Skip to content

Graph node follow-ups: repr, containment, empty(), registry docs#1859

Open
Andy-Jost wants to merge 25 commits intoNVIDIA:mainfrom
Andy-Jost:graph-node-repr
Open

Graph node follow-ups: repr, containment, empty(), registry docs#1859
Andy-Jost wants to merge 25 commits intoNVIDIA:mainfrom
Andy-Jost:graph-node-repr

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

@Andy-Jost Andy-Jost commented Apr 3, 2026

Summary

Follow-on changes addressing review feedback from #1850 and #1853.

Changes

  • __repr__ improvements (_subclasses.pyx): All GraphNode subclass __repr__ methods now start with handle=0x... followed by static, type-specific identity data, enhancing their usefulness for debugging.
  • Cheap containment test (_adjacency_set_proxy.pyx): Added _AdjacencySetCore.contains() that checks membership by comparing raw CUgraphNode handles at the C level, avoiding Python object construction. Uses a 16-element stack buffer for a single driver call in the common case.
  • Stack-buffer optimization in query() (_adjacency_set_proxy.pyx): Same 16-element stack buffer approach applied to query(), eliminating the count-then-fetch two-call pattern for small adjacency sets.
  • Single-call optimization in nodes() and edges() (_graph_def.pyx): Pre-allocate vectors to 128 entries and pass them on the first driver call. Only fall back to a second call if the graph exceeds 128 nodes/edges.
  • Early type check in update() (_adjacency_set_proxy.pyx): Type validation moved inline to fail fast on invalid input, per review feedback.
  • GraphDef.empty() (_graph_def.pyx): New method for creating entry-point empty nodes without the awkward g.join() spelling.
  • Registry cleanup test (test_graphdef.py): test_registry_cleanup exercises destroy(), graph deletion, and weak-reference cleanup of the node registry.
  • Registry design documentation (_cpp/REGISTRY_DESIGN.md): Documents the two-level identity map (C++ HandleRegistry and Cython _node_registry) with cross-references at each instantiation site.

Test Coverage

  • Existing graph tests validate empty() (replacing prior g.join() calls).
  • test_registry_cleanup covers explicit destroy, graph teardown, and GC-driven cleanup.
  • __repr__ patterns validated by test_object_protocols.py.

Andy-Jost added 15 commits April 2, 2026 09:43
Rename test files to reflect what they actually test:
- test_basic -> test_graph_builder (stream capture tests)
- test_conditional -> test_graph_builder_conditional
- test_advanced -> test_graph_update (moved child_graph and
  stream_lifetime tests into test_graph_builder)
- test_capture_alloc -> test_graph_memory_resource
- test_explicit* -> test_graphdef*

Made-with: Cursor
- Extend Graph.update() to accept both GraphBuilder and GraphDef sources
- Surface CUgraphExecUpdateResultInfo details on update failure instead
  of a generic CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE message
- Release the GIL during cuGraphExecUpdate via nogil block
- Add parametrized happy-path test covering both GraphBuilder and GraphDef
- Add error-case tests: unfinished builder, topology mismatch, wrong type

Made-with: Cursor
Replace cached tuple-based pred/succ with mutable AdjacencySet backed
by direct CUDA driver calls. Add GraphNode.remove() wrapping
cuGraphDestroyNode.

Made-with: Cursor
…cencies

Enable adding/removing edges between graph nodes via AdjacencySet (a
MutableSet proxy on GraphNode.pred/succ), node removal via discard(),
and property setters for bulk edge replacement. Includes comprehensive
mutation and interface tests.

Closes part of NVIDIA#1330 (step 2: edge mutation on GraphDef).

Made-with: Cursor
Replace inline skipif version check with requires_module(np, "2.1")
from the shared test helpers, consistent with other test files.

Made-with: Cursor
Rename class and file to AdjacencySetProxy to clarify write-through
semantics. Add bulk-efficient clear(), __isub__(), __ior__() overrides
and remove_edges() on the Cython core. Guard GraphNode.discard() against
double-destroy via membership check. Filter duplicates in update(). Add
error-path tests for wrong types, cross-graph edges, and self-edges.

Made-with: Cursor
…INEL

Replace discard() with destroy() which calls cuGraphDestroyNode and then
zeroes the CUgraphNode resource in the handle box via
invalidate_graph_node_handle. This prevents stale memory access on
destroyed nodes. Properties (type, pred, succ, handle) degrade gracefully
to None/empty for destroyed nodes.

Remove the GRAPH_NODE_SENTINEL (0x1) approach in favor of using NULL for
both sentinels and destroyed nodes, which is simpler and avoids the risk
of passing 0x1 to driver APIs that treat it as a valid pointer.

Made-with: Cursor
Nodes retrieved via GraphDef.nodes(), edges(), or pred/succ traversal
now return the same Python object that was originally created, enabling
identity checks with `is`. A C++ HandleRegistry deduplicates
CUgraphNode handles, and a Cython WeakValueDictionary caches the
Python wrapper objects.

Made-with: Cursor
…sion

Sentinel (entry) nodes use NULL as their CUgraphNode, so caching them
under a NULL key caused all sentinels across different graphs to share
the same handle. This made nodes built from the wrong graph's entry
point, causing CUDA_ERROR_INVALID_VALUE for conditional nodes and hash
collisions in equality tests.

Made-with: Cursor
When a node is destroyed, the driver may reuse its CUgraphNode pointer
for a new node. Without unregistering the old entry, the registry
returns a stale handle pointing to the wrong node type and graph.

Made-with: Cursor
Every subclass repr now starts with handle=0x... (the CUgraphNode
pointer) followed by type-specific identity/parameter data. Dynamic
queries (pred counts, subnode counts) are removed in favor of
deterministic, cheap fields. This makes set comparison failures in
test output readable when debugging graph mutation tests.

Made-with: Cursor
@Andy-Jost Andy-Jost added this to the cuda.core v0.7.0 milestone Apr 3, 2026
@Andy-Jost Andy-Jost added the cuda.core Everything related to the cuda.core module label Apr 3, 2026
@Andy-Jost Andy-Jost self-assigned this Apr 3, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Copy Markdown
Contributor Author

/ok to test

Resolve conflicts: keep _cached() wrappers and _node_cache.pop() from
graph-node-identity, keep identity comparison (is not) in test.

Made-with: Cursor
Aligns Python-side terminology with the C++ graph_node_registry.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

unregister_handle: remove the expired() guard that prevented erasure
when the shared_ptr was still alive. This caused stale registry
entries after destroy(), leading to CUDA_ERROR_INVALID_VALUE when
the driver reused CUgraphNode pointer values.

Rename invalidate_graph_node_handle -> invalidate_graph_node for
consistency with the rest of the graph node API.

Made-with: Cursor
Add _AdjacencySetCore.contains() that checks membership by comparing
raw CUgraphNode handles at the C level, avoiding Python object
construction. Uses a 16-element stack buffer for a single driver call
in the common case.

Move the type check in update() inline next to the extend loop so
invalid input is rejected immediately.

Made-with: Cursor
- Add GraphDef.empty() for creating entry-point empty nodes; replace
  all no-arg join() calls on GraphDef with empty() in tests.
- Optimize _AdjacencySetCore.query() to use a 16-element stack buffer,
  matching the contains() optimization.
- Add test_registry_cleanup exercising destroy(), graph deletion, and
  weak-reference cleanup of the node registry.

Made-with: Cursor
Add REGISTRY_DESIGN.md explaining how the C++ HandleRegistry (Level 1)
and Cython _node_registry (Level 2) work together to preserve Python
object identity through driver round-trips. Add cross-references at
each registry instantiation site.

Made-with: Cursor
@Andy-Jost Andy-Jost changed the title Add handle= to all GraphNode subclass __repr__ Graph node follow-ups: repr, containment, empty(), registry docs Apr 3, 2026
@Andy-Jost
Copy link
Copy Markdown
Contributor Author

/ok to test

Comment on lines 142 to 160
cdef list query(self):
cdef cydriver.CUgraphNode c_node = as_cu(self._h_node)
if c_node == NULL:
return []
cdef size_t count = 0
cdef cydriver.CUgraphNode buf[16]
cdef size_t count = 16
cdef size_t i
with nogil:
HANDLE_RETURN(self._query_fn(c_node, NULL, &count))
if count == 0:
return []
HANDLE_RETURN(self._query_fn(c_node, buf, &count))
if count <= 16:
return [GraphNode._create(self._h_graph, buf[i])
for i in range(count)]
cdef vector[cydriver.CUgraphNode] nodes_vec
nodes_vec.resize(count)
with nogil:
HANDLE_RETURN(self._query_fn(
c_node, nodes_vec.data(), &count))
return [GraphNode._create(self._h_graph, nodes_vec[i])
for i in range(count)]
Copy link
Copy Markdown
Contributor Author

@Andy-Jost Andy-Jost Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to return a generator as suggested here because nodes_vec is a stack object and it would lead to a use-after-free unless we define a cdef class for the iterator. I added an optimized contains method that avoids reconstructing Python objects for containment checks.

@Andy-Jost
Copy link
Copy Markdown
Contributor Author

/ok to test

@Andy-Jost Andy-Jost marked this pull request as ready for review April 3, 2026 22:53
Pre-allocate vectors to 128 entries and pass them on the first call.
Only fall back to a second call if the graph exceeds 128 nodes/edges.

Made-with: Cursor
@Andy-Jost Andy-Jost added enhancement Any code-related improvements P0 High priority - Must do! labels Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant