ci: add SynapseML-Internal compatibility check to OSS pipeline#2542
ci: add SynapseML-Internal compatibility check to OSS pipeline#2542BrendanWalsh wants to merge 5 commits intomasterfrom
Conversation
Add an InternalCompat job that validates SynapseML-Internal compiles against the current OSS build. Triggered via the testInternalCompat pipeline parameter (default: false). The job: 1. Checks out both OSS and Internal repos 2. Publishes OSS JARs to local Maven (~/.m2) via sbt publishM2 3. Retargets Internal's build.sbt to use the just-built OSS version (sed-replaces synapseMLVersion and adds Resolver.mavenLocal) 4. Runs sbt compile + Test/compile on Internal This catches API-breaking changes (removed classes, changed signatures, renamed packages) before they land in a release and break Internal. Locally validated: - sed patterns correctly modify Internal's build.sbt - sbt version extraction works (core/version -> [info] line parsing) - Internal compile + Test/compile succeeds against OSS artifacts in M2
The conda env creation was failing because synapseml-utils is in the private A365/Synapse-Conda ADO feed. Added PipAuthenticate@1 before conda env create, matching SynapseML-Internal's templates/conda.yml. Also split conda setup into discrete steps (PATH, permissions, auth, TOS, create) for clearer logs.
Internal's environment.yaml pulls PyTorch, CUDA libs, etc. (~15GB). Agent disk fills up before pip finishes. Remove Android SDK, .NET, Boost, GHC, and docker images to reclaim ~30GB. Also add pip cache purge after env creation and bump job timeout to 90min.
The generateSemPyParquetWriterToolTask runs dotnet publish during Compile/managedResources when CREATE_SEMPY_WRITER=true. We removed the .NET SDK in the disk cleanup step, and the SemPy writer isn't needed for compat testing. Set CREATE_SEMPY_WRITER=false.
1. Fix false-green: the || echo pattern swallowed exit codes, so the step always reported success even when all tests failed. Now tracks failures and exits non-zero. 2. Only run spark.aifunc tests. The other packages (powerbi, ebm, predict) extend HasSparkSession which eagerly initializes FabricTestConstants.INTEGRATION_WORKSPACE_ID — this requires INTEGRATION_ACCOUNT from fabrictest-cert-admin-kv, a Fabric-only Key Vault we don't have access to in the OSS pipeline.
|
Hey @BrendanWalsh 👋! We use semantic commit messages to streamline the release process. Examples of commit messages with semantic prefixes:
To test your commit locally, please follow our guild on building from source. |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
Adds an Azure DevOps pipeline job to continuously validate that the closed-source SynapseML-Internal repo still compiles and passes a targeted unit-test subset when built against the current OSS SynapseML artifacts, helping detect breaking changes earlier.
Changes:
- Adds a
SynapseML-Internalrepository resource to the pipeline. - Introduces a new non-blocking
InternalCompatjob that publishes OSS artifacts to local Maven, retargets Internal to that version, compiles, creates the Internal conda environment, and runsspark.aifunctests. - Publishes Internal test results while keeping the job non-gating (
continueOnError: true).
Show a summary per file
| File | Description |
|---|---|
pipeline.yaml |
Adds a repo resource and a new non-blocking CI job to compile/test SynapseML-Internal against locally published OSS artifacts. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 1
| - repository: self | ||
| type: self |
There was a problem hiding this comment.
In the resources.repositories block, declaring the pipeline repo (repository: self) is typically unnecessary (the pipeline already has an implicit self repo for checkout: self). Consider removing this entry to reduce confusion and keep only the external SynapseML-Internal repository resource.
| - repository: self | |
| type: self |
There was a problem hiding this comment.
The explicit self declaration is needed here. The original pipeline had - repo: self (shorthand syntax). When restructuring to the full resources.repositories block to add the SynapseML-Internal external repo, the self entry must be included — ADO requires it in the list format for multi-repo checkout (checkout: self + checkout: SynapseML-Internal) to work correctly.
Summary
Adds a non-blocking CI job (
InternalCompat) to the OSS pipeline that validates SynapseML-Internal compiles and passes unit tests against the current OSS build. This catches breaking changes before they reach Internal.What it does
build.sbtto use the OSS version from this build + addsResolver.mavenLocalsbt compile Test/compileagainst retargeted Internalmmlspark-keys, and runsspark.aifunctests (128 tests)Design decisions
continueOnError: trueso failures surface as warnings, not build failurespowerbi,ebm,predict) extendHasSparkSessionwhich eagerly initializesFabricTestConstants, requiring Fabric credentials fromfabrictest-cert-admin-kv(not available in the OSS pipeline)CI validation
Changes
All changes are in
pipeline.yaml:SynapseML-InternalInternalCompatjob (~100 lines)