[CSM-1857] Stabilize Azure detector hash_v2 with deterministic iteration#4846
Open
dipto-truffle wants to merge 2 commits intomainfrom
Open
[CSM-1857] Stabilize Azure detector hash_v2 with deterministic iteration#4846dipto-truffle wants to merge 2 commits intomainfrom
dipto-truffle wants to merge 2 commits intomainfrom
Conversation
ProcessData in the Azure Entra v2 detector iterated Go maps non-deterministically, causing the same credential to produce different (clientId, tenantId) pairings across scanner runs. This yielded different RawV2 values, different hash_v2 hashes, and duplicate secret rows in the database (CSM-1857 secondary issue). Iterate map keys in sorted order via slices.Sorted(maps.Keys(...)) and clone caller maps with maps.Clone before verification-driven mutations. Same inputs now always produce the same RawV2/hash_v2. Made-with: Cursor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
pkg/detectors/azure_entra/serviceprincipal/v2/spv2_determinism_test.go
Outdated
Show resolved
Hide resolved
The test previously passed verify=false, but all delete calls that mutate maps are inside the `if verify` block, making the test a no-op. Use verify=true with a mock HTTP client that triggers tenant-not-found deletions so the test actually validates the maps.Clone protection. Made-with: Cursor
| if r == nil { | ||
| // Only include the clientId and tenantId if we're confident which one it is. | ||
| if len(clientIds) != 1 { | ||
| if len(activeClients) != 1 { |
Contributor
There was a problem hiding this comment.
I might be mistaken, but removing the comments feels a bit aggressive; the PR strips out almost all comments in ProcessData. What do you think?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
ProcessDatain the Azure Entra v2 detector iterated Go maps (clientSecrets,clientIds,tenantIds) non-deterministically, causing the same credential to produce different(clientId, tenantId)pairings across scanner runs. This yielded differentRawV2values, differenthash_v2hashes, and duplicate secret rows in the database (the secondary issue in CSM-1857).slices.Sorted(maps.Keys(...))and clone caller maps withmaps.Clonebefore verification-driven mutations. Same inputs now always produce the sameRawV2/hash_v2.Corresponds to trufflesecurity/thog#5936.
Test plan
go test ./pkg/detectors/azure_entra/serviceprincipal/v2/ -v— all 8 tests passTestProcessData_DeterministicRawV2confirms identical RawV2 across 50 repeated calls with the same inputsTestProcessData_DoesNotMutateCallerMapsconfirms caller maps are not modifiedTestProcessData_RawV2DependsOnIDCountconfirms RawV2 is populated only with unambiguous IDsTestProcessData_SameSecretDifferentRawV2documents the RawV2 divergence when chunk context differsgo test ./pkg/detectors/azure_entra/...— all passMade with Cursor
Note
Medium Risk
Changes the Azure Entra v2
ProcessDatacredential pairing/verification loops to be deterministic and non-mutating, which affects secret identification and deduplication behavior. Risk is moderate because it alters how IDs are selected/cleared and may changeRawV2/hash_v2outputs for some scan contexts.Overview
Stabilizes Azure Entra service principal v2
RawV2/hash_v2generation by makingProcessDataiterateclientSecrets,clientIds, andtenantIdsin sorted key order instead of Go map iteration order.Prevents verification paths from mutating caller-provided
clientIds/tenantIdsby cloning them intoactiveClients/activeTenantsbefore doing anydeleteoperations.Adds
spv2_determinism_test.goto assertRawV2determinism across repeated runs, document whenRawV2is omitted due to ambiguous IDs, and ensure caller maps are not modified during verification.Written by Cursor Bugbot for commit 6fc2ccb. This will update automatically on new commits. Configure here.