Skip to content

[CSM-1857] Stabilize Azure detector hash_v2 with deterministic iteration#4846

Open
dipto-truffle wants to merge 2 commits intomainfrom
test/csm-1857-hashv2-instability
Open

[CSM-1857] Stabilize Azure detector hash_v2 with deterministic iteration#4846
dipto-truffle wants to merge 2 commits intomainfrom
test/csm-1857-hashv2-instability

Conversation

@dipto-truffle
Copy link
Copy Markdown
Contributor

@dipto-truffle dipto-truffle commented Mar 27, 2026

Summary

  • Root cause: ProcessData in the Azure Entra v2 detector iterated Go maps (clientSecrets, clientIds, tenantIds) non-deterministically, causing the same credential to produce different (clientId, tenantId) pairings across scanner runs. This yielded different RawV2 values, different hash_v2 hashes, and duplicate secret rows in the database (the secondary issue in CSM-1857).
  • Fix: Iterate map keys in sorted order via slices.Sorted(maps.Keys(...)) and clone caller maps with maps.Clone before verification-driven mutations. Same inputs now always produce the same RawV2/hash_v2.
  • Tests: Added unit tests for RawV2 stability (ID count sensitivity, determinism over 50 runs, caller map immutability, hash divergence chain).

Corresponds to trufflesecurity/thog#5936.

Test plan

  • go test ./pkg/detectors/azure_entra/serviceprincipal/v2/ -v — all 8 tests pass
  • TestProcessData_DeterministicRawV2 confirms identical RawV2 across 50 repeated calls with the same inputs
  • TestProcessData_DoesNotMutateCallerMaps confirms caller maps are not modified
  • TestProcessData_RawV2DependsOnIDCount confirms RawV2 is populated only with unambiguous IDs
  • TestProcessData_SameSecretDifferentRawV2 documents the RawV2 divergence when chunk context differs
  • go test ./pkg/detectors/azure_entra/... — all pass

Made with Cursor


Note

Medium Risk
Changes the Azure Entra v2 ProcessData credential pairing/verification loops to be deterministic and non-mutating, which affects secret identification and deduplication behavior. Risk is moderate because it alters how IDs are selected/cleared and may change RawV2/hash_v2 outputs for some scan contexts.

Overview
Stabilizes Azure Entra service principal v2 RawV2/hash_v2 generation by making ProcessData iterate clientSecrets, clientIds, and tenantIds in sorted key order instead of Go map iteration order.

Prevents verification paths from mutating caller-provided clientIds/tenantIds by cloning them into activeClients/activeTenants before doing any delete operations.

Adds spv2_determinism_test.go to assert RawV2 determinism across repeated runs, document when RawV2 is omitted due to ambiguous IDs, and ensure caller maps are not modified during verification.

Written by Cursor Bugbot for commit 6fc2ccb. This will update automatically on new commits. Configure here.

ProcessData in the Azure Entra v2 detector iterated Go maps
non-deterministically, causing the same credential to produce different
(clientId, tenantId) pairings across scanner runs. This yielded
different RawV2 values, different hash_v2 hashes, and duplicate secret
rows in the database (CSM-1857 secondary issue).

Iterate map keys in sorted order via slices.Sorted(maps.Keys(...)) and
clone caller maps with maps.Clone before verification-driven mutations.
Same inputs now always produce the same RawV2/hash_v2.

Made-with: Cursor
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 27, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

The test previously passed verify=false, but all delete calls that
mutate maps are inside the `if verify` block, making the test a no-op.
Use verify=true with a mock HTTP client that triggers tenant-not-found
deletions so the test actually validates the maps.Clone protection.

Made-with: Cursor
@dipto-truffle dipto-truffle marked this pull request as ready for review April 1, 2026 21:11
@dipto-truffle dipto-truffle requested a review from a team April 1, 2026 21:11
@dipto-truffle dipto-truffle requested a review from a team as a code owner April 1, 2026 21:11
if r == nil {
// Only include the clientId and tenantId if we're confident which one it is.
if len(clientIds) != 1 {
if len(activeClients) != 1 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be mistaken, but removing the comments feels a bit aggressive; the PR strips out almost all comments in ProcessData. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants