ci: add SynapseML-Internal compatibility check to OSS pipeline by BrendanWalsh · Pull Request #2542 · microsoft/SynapseML

BrendanWalsh · 2026-04-04T01:09:50Z

Summary

Adds a non-blocking CI job (InternalCompat) to the OSS pipeline that validates SynapseML-Internal compiles and passes unit tests against the current OSS build. This catches breaking changes before they reach Internal.

What it does

publishM2 — builds OSS JARs and publishes to local Maven repo
Retarget — seds Internal's build.sbt to use the OSS version from this build + adds Resolver.mavenLocal
Compile — runs sbt compile Test/compile against retargeted Internal
Unit tests — creates Internal's conda env (with Synapse-Conda feed auth), fetches AI service secrets from mmlspark-keys, and runs spark.aifunc tests (128 tests)

Design decisions

Always runs, never blocks — continueOnError: true so failures surface as warnings, not build failures
spark.aifunc only — other test packages (powerbi, ebm, predict) extend HasSparkSession which eagerly initializes FabricTestConstants, requiring Fabric credentials from fabrictest-cert-admin-kv (not available in the OSS pipeline)
No Java pin — Internal uses agent-default Java 11 (not Java 8 like OSS), so we match that
Disk cleanup — Internal's conda env pulls PyTorch/CUDA (~15GB); we free ~30GB by removing Android SDK, .NET, GHC, Boost, and Docker images
CREATE_SEMPY_WRITER=false — the SemPy parquet writer dotnet codegen step is not needed for compat testing

CI validation

Build #213700535 — ✅ all green, 128/128 spark.aifunc tests passed
Prior runs validated compile-only, disk space, feed auth, and Java version fixes

Changes

All changes are in pipeline.yaml:

Added ADO repo resource for SynapseML-Internal
Added InternalCompat job (~100 lines)

Add an InternalCompat job that validates SynapseML-Internal compiles against the current OSS build. Triggered via the testInternalCompat pipeline parameter (default: false). The job: 1. Checks out both OSS and Internal repos 2. Publishes OSS JARs to local Maven (~/.m2) via sbt publishM2 3. Retargets Internal's build.sbt to use the just-built OSS version (sed-replaces synapseMLVersion and adds Resolver.mavenLocal) 4. Runs sbt compile + Test/compile on Internal This catches API-breaking changes (removed classes, changed signatures, renamed packages) before they land in a release and break Internal. Locally validated: - sed patterns correctly modify Internal's build.sbt - sbt version extraction works (core/version -> [info] line parsing) - Internal compile + Test/compile succeeds against OSS artifacts in M2

The conda env creation was failing because synapseml-utils is in the private A365/Synapse-Conda ADO feed. Added PipAuthenticate@1 before conda env create, matching SynapseML-Internal's templates/conda.yml. Also split conda setup into discrete steps (PATH, permissions, auth, TOS, create) for clearer logs.

Internal's environment.yaml pulls PyTorch, CUDA libs, etc. (~15GB). Agent disk fills up before pip finishes. Remove Android SDK, .NET, Boost, GHC, and docker images to reclaim ~30GB. Also add pip cache purge after env creation and bump job timeout to 90min.

The generateSemPyParquetWriterToolTask runs dotnet publish during Compile/managedResources when CREATE_SEMPY_WRITER=true. We removed the .NET SDK in the disk cleanup step, and the SemPy writer isn't needed for compat testing. Set CREATE_SEMPY_WRITER=false.

1. Fix false-green: the || echo pattern swallowed exit codes, so the step always reported success even when all tests failed. Now tracks failures and exits non-zero. 2. Only run spark.aifunc tests. The other packages (powerbi, ebm, predict) extend HasSparkSession which eagerly initializes FabricTestConstants.INTEGRATION_WORKSPACE_ID — this requires INTEGRATION_ACCOUNT from fabrictest-cert-admin-kv, a Fabric-only Key Vault we don't have access to in the OSS pipeline.

github-actions · 2026-04-04T01:10:02Z

Hey @BrendanWalsh 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

fix: Fix LightGBM crashes with empty partitions
feat: Make HTTP on Spark back-offs configurable
docs: Update Spark Serving usage
build: Add codecov support
perf: improve LightGBM memory usage
refactor: make python code generation rely on classes
style: Remove nulls from CNTKModel
test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

github-actions · 2026-04-04T01:10:05Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 3143041.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copilot

Pull request overview

Adds an Azure DevOps pipeline job to continuously validate that the closed-source SynapseML-Internal repo still compiles and passes a targeted unit-test subset when built against the current OSS SynapseML artifacts, helping detect breaking changes earlier.

Changes:

Adds a SynapseML-Internal repository resource to the pipeline.
Introduces a new non-blocking InternalCompat job that publishes OSS artifacts to local Maven, retargets Internal to that version, compiles, creates the Internal conda environment, and runs spark.aifunc tests.
Publishes Internal test results while keeping the job non-gating (continueOnError: true).

Show a summary per file

File	Description
`pipeline.yaml`	Adds a repo resource and a new non-blocking CI job to compile/test SynapseML-Internal against locally published OSS artifacts.

Copilot's findings

Files reviewed: 1/1 changed files
Comments generated: 1

Copilot · 2026-04-04T01:15:02Z

pipeline.yaml

+  - repository: self
+    type: self


In the resources.repositories block, declaring the pipeline repo (repository: self) is typically unnecessary (the pipeline already has an implicit self repo for checkout: self). Consider removing this entry to reduce confusion and keep only the external SynapseML-Internal repository resource.

Suggested change

- repository: self

type: self

The explicit self declaration is needed here. The original pipeline had - repo: self (shorthand syntax). When restructuring to the full resources.repositories block to add the SynapseML-Internal external repo, the self entry must be included — ADO requires it in the list format for multi-repo checkout (checkout: self + checkout: SynapseML-Internal) to work correctly.

BrendanWalsh added 5 commits April 3, 2026 04:18

Copilot AI review requested due to automatic review settings April 4, 2026 01:09

Copilot started reviewing on behalf of BrendanWalsh April 4, 2026 01:10 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add SynapseML-Internal compatibility check to OSS pipeline#2542

ci: add SynapseML-Internal compatibility check to OSS pipeline#2542
BrendanWalsh wants to merge 5 commits intomasterfrom
brwals/internal-compat-check

BrendanWalsh commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

BrendanWalsh Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BrendanWalsh commented Apr 4, 2026

Summary

What it does

Design decisions

CI validation

Changes

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

BrendanWalsh Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants