Skip to content

[FEA]: Add pathfinder cudla support (.so, .h)#1855

Open
rwgk wants to merge 9 commits intoNVIDIA:mainfrom
rwgk:pathfinder_cudla
Open

[FEA]: Add pathfinder cudla support (.so, .h)#1855
rwgk wants to merge 9 commits intoNVIDIA:mainfrom
rwgk:pathfinder_cudla

Conversation

@rwgk
Copy link
Copy Markdown
Collaborator

@rwgk rwgk commented Apr 3, 2026

Closes #1857

  • Add cuda.pathfinder support for the cuDLA shared libraries cudla and nvcudla.
  • Teach load_nvidia_dynamic_lib("cudla") to locate libcudla.so.1 from either CUDA Toolkit installs or the nvidia-cudla wheel, and add nvcudla as the system-search canary/runtime dependency.
  • Add locate_nvidia_header_directory("cudla") support for cudla.h from from either CUDA Toolkit installs or the standard nvidia/cu13/include wheel layout.
  • Keep "all_must_work" tests green on hosts without the platform runtime by skipping the cudla/nvcudla checks when libnvcudla.so is not loadable, while still exercising the real success path on Tegra systems.

Manual Testing (in addition to CI testing) passed on Tegra Thor (CTK 13.2) and Tegra Orin (unreleased CTK):

WithOUT nvidia-cudla wheel:

INFO test_real_load_driver_lib[nvcudla]: abs_path=/lib/aarch64-linux-gnu/libnvcudla.so
INFO test_locate_ctk_headers[cudla]: hdr_dir='/usr/local/cuda/include'
INFO test_load_nvidia_dynamic_lib[nvcudla]: abs_path=/lib/aarch64-linux-gnu/libnvcudla.so
INFO test_load_nvidia_dynamic_lib[cudla]: abs_path=/usr/local/cuda/targets/sbsa-linux/lib/libcudla.so.1

With nvidia-cudla wheel:

INFO test_load_nvidia_dynamic_lib[cudla]: abs_path=/home/rgrossekunst/wrk/forked/cuda-python/Pathfinder13Venv/lib/python3.12/site-packages/nvidia/cu13/lib/libcudla.so.1
INFO test_load_nvidia_dynamic_lib[nvcudla]: abs_path=/lib/aarch64-linux-gnu/libnvcudla.so
INFO test_real_load_driver_lib[nvcudla]: abs_path=/lib/aarch64-linux-gnu/libnvcudla.so
INFO test_locate_ctk_headers[cudla]: hdr_dir='/home/rgrossekunst/wrk/forked/cuda-python/Pathfinder13Venv/lib/python3.12/site-packages/nvidia/cu13/include'

rwgk added 2 commits April 2, 2026 14:35
Add pathfinder support for loading ``libcudla.so.1`` from the ``nvidia-cudla``
package and probing ``libnvcudla.so`` through the existing canary subprocess
path. Use that probe in the cudla load test so hosts without the platform
runtime are skipped, while real ``libcudla.so.1`` load failures still surface
when ``libnvcudla.so`` is available.

Made-with: Cursor
Mark cudla and nvcudla as aarch64-only descriptors and derive the supported
library tables from the current machine as well as the current OS. This keeps
those libraries known to pathfinder while reporting them as unavailable on
linux-64, and updates the descriptor-registry tests to match the new
current-platform filtering model.

Made-with: Cursor
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk added 5 commits April 3, 2026 14:34
Skip the cudla and nvcudla load tests on aarch64 hosts when the nvcudla canary
probe cannot resolve libnvcudla.so. This keeps non-Tegra linux-aarch64 systems
from failing strict test runs while still exercising the real success path on
Tegra platforms where the platform runtime is installed.

Made-with: Cursor
Remove the machine-architecture gating for cudla and nvcudla so they remain
part of the normal Linux descriptor tables. Let the nvcudla canary probe decide
whether cudla and nvcudla tests should run, which keeps strict test runs green
on hosts without the platform runtime while still exercising real load behavior
where libnvcudla.so is available.

Made-with: Cursor
Move the libnvcudla.so skip logic into conftest so cudla and nvcudla tests use
one shared rule. Keeping the helper in the pytest support layer avoids duplicate
test code while still deferring the pathfinder import until the helper runs.

Made-with: Cursor
Register cudla as a CTK header so locate_nvidia_header_directory() can find
cudla.h in the standard cu13 wheel include directory. In strict header tests,
skip cudla on hosts where libnvcudla.so is not available so Tegra setups still
exercise the real path without making unsupported hosts fail.

Made-with: Cursor
Move cudla into the CTK descriptor block so its packaging classification matches
how it is shipped in toolkit installs and the optional nvidia-cudla wheel.
This keeps the catalog organization consistent with the current understanding of
cudla as a CUDA Toolkit component rather than a third-party add-on.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 3, 2026

/ok to test

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

@rwgk rwgk changed the title Add support for pathfinder.load_nvidia_dynamic_lib("cudla") Add pathfinder cudla support (.so, .h) Apr 4, 2026
@rwgk rwgk changed the title Add pathfinder cudla support (.so, .h) [FEA]: Add pathfinder cudla support (.so, .h) Apr 4, 2026
@rwgk rwgk self-assigned this Apr 4, 2026
@rwgk rwgk added P0 High priority - Must do! cuda.pathfinder Everything related to the cuda.pathfinder module feature New feature or request labels Apr 4, 2026
@rwgk rwgk added this to the cuda.pathfinder next milestone Apr 4, 2026
@rwgk rwgk marked this pull request as ready for review April 4, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.pathfinder Everything related to the cuda.pathfinder module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA]: Add pathfinder cudla support (.so, .h)

1 participant