Skip to content

[DOCS ]Add annotated partitioning documentation#27972

Open
yuslepukhin wants to merge 6 commits intomainfrom
yuslepukhin/layering_docs
Open

[DOCS ]Add annotated partitioning documentation#27972
yuslepukhin wants to merge 6 commits intomainfrom
yuslepukhin/layering_docs

Conversation

@yuslepukhin
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin commented Apr 3, 2026

This pull request introduces a new documentation page, PartitioningWithAnnotationsAndMemoryConstraints.md, which explains advanced ONNX Runtime features for partitioning model graphs across devices with explicit control. The doc covers how to annotate model layers for device assignment, collect per-node memory statistics, and enforce GPU memory budgets during partitioning. These features enable precise control over device placement and memory usage for large models.

The most important changes are:

New Documentation: Advanced Partitioning Features

  • Adds a comprehensive guide (PartitioningWithAnnotationsAndMemoryConstraints.md) describing how to use ONNX Runtime’s layer annotation and memory constraint features for graph partitioning.

Layer Assignment via Annotations

  • Explains how to annotate ONNX model nodes with layer_ann metadata, including manual annotation and automated annotation using Olive’s CaptureLayerAnnotations pass.
  • Provides configuration examples for mapping annotation patterns to devices at runtime using the session.layer_assignment_settings session option.

Capacity-Aware Partitioning

This is a follow up for #27595

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new documentation page describing advanced ONNX Runtime graph partitioning features—layer annotation–based device assignment and CUDA capacity-aware partitioning—intended to guide users through annotation, profiling, and constrained partitioning workflows.

Changes:

  • Documented layer_ann node annotations and runtime mapping via session.layer_assignment_settings.
  • Documented memory-stat collection (session.collect_node_memory_stats_to_file) and CUDA memory-budget partitioning (session.resource_cuda_partitioning_settings).
  • Included end-to-end examples for combining annotation-based assignment with memory-constrained partitioning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yuslepukhin yuslepukhin requested review from Copilot and tianleiwu April 3, 2026 18:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yuslepukhin yuslepukhin requested a review from tianleiwu April 3, 2026 20:24
tianleiwu
tianleiwu previously approved these changes Apr 3, 2026
@yuslepukhin yuslepukhin enabled auto-merge (squash) April 3, 2026 23:56
@yuslepukhin
Copy link
Copy Markdown
Member Author

/azp run web_Release / build_onnxruntime_web

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants