Skip to content

Improve live test playback running#2333

Open
alzimmermsft wants to merge 4 commits intomicrosoft:mainfrom
alzimmermsft:ImproveLiveTestPlayback
Open

Improve live test playback running#2333
alzimmermsft wants to merge 4 commits intomicrosoft:mainfrom
alzimmermsft:ImproveLiveTestPlayback

Conversation

@alzimmermsft
Copy link
Copy Markdown
Contributor

@alzimmermsft alzimmermsft commented Apr 3, 2026

What does this PR do?

Improves running live tests in playback with the following changes:

  • Retry after headers are now stripped by RecordingRedirectHandler when the test mode is known to be Playback. Reducing playback time spent just waiting on retries that should be instantaneous as we aren't waiting for service throttling / metering.
  • Fixed a pain point where AzurePowerShellCredential is not available could be seen in playback testing due to how LiveTestSettings initialized. It has a default of Live test mode, but if we have .testsettings.json configured to playback it would still try to set up the principal settings, that is now guarded on test mode changing to playback.
  • Added a helper method to override default polling delay to 1 millisecond when running in playback as playback tests don't wait for a server-side operation to complete.

GitHub issue number?

[Link to the GitHub issue this PR addresses]

Pre-merge Checklist

  • Required for All PRs
    • Read contribution guidelines
    • PR title clearly describes the change
    • Commit history is clean with descriptive messages (cleanup guide)
    • Added comprehensive tests for new/modified functionality
    • Created a changelog entry if the change falls among the following: new feature, bug fix, UI/UX update, breaking change, or updated dependencies. Follow the changelog entry guide
  • For MCP tool changes:
    • One tool per PR: This PR adds or modifies only one MCP tool for faster review cycles
    • Updated servers/Azure.Mcp.Server/README.md and/or servers/Fabric.Mcp.Server/README.md documentation
    • Validate README.md changes running the script ./eng/scripts/Process-PackageReadMe.ps1. See Package README
    • For new or modified tool descriptions, ran ToolDescriptionEvaluator and obtained a score of 0.4 or more and a top 3 ranking for all related test prompts
    • For tools with new names, including new tools or renamed tools, update consolidated-tools.json
    • For renamed tools, follow the Tool Rename Checklist and tag the PR with the breaking-change label
    • For new tools associated with Azure services or publicly available tools/APIs/products, add URL to documentation in the PR description
  • Extra steps for Azure MCP Server tool changes:
    • Updated command list in servers/Azure.Mcp.Server/docs/azmcp-commands.md
    • Ran ./eng/scripts/Update-AzCommandsMetadata.ps1 to update tool metadata in azmcp-commands.md (required for CI)
    • Updated test prompts in servers/Azure.Mcp.Server/docs/e2eTestPrompts.md
    • 👉 For Community (non-Microsoft team member) PRs:
      • Security review: Reviewed code for security vulnerabilities, malicious code, or suspicious activities before running tests (crypto mining, spam, data exfiltration, etc.)
      • Manual tests run: added comment /azp run mcp - pullrequest - live to run Live Test Pipeline

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on speeding up recorded test playback by avoiding unnecessary waits/retries and by standardizing ARM long-running-operation (LRO) handling across multiple Azure tool services.

Changes:

  • Switch many ARM LRO calls from WaitUntil.Completed to WaitUntil.Started + a shared WaitForLroCompletionAsync helper (with a shorter polling interval in playback).
  • In playback, strip/override Retry-After-style response headers coming back through RecordingRedirectHandler to prevent “real-world” backoff delays.
  • Improve playback-mode initialization by guarding principal setup and propagating playback signaling to the server process.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tools/Azure.Mcp.Tools.Workbooks/src/Services/WorkbooksService.cs Uses Start+wait helper for workbook create/delete LROs.
tools/Azure.Mcp.Tools.StorageSync/src/Services/StorageSyncService.cs Uses Start+wait helper for multiple Storage Sync LROs; minor modern C# cleanups.
tools/Azure.Mcp.Tools.Sql/src/Services/SqlService.cs Uses Start+wait helper for SQL ARM LROs; simplifies null-check throw.
tools/Azure.Mcp.Tools.Postgres/src/Services/PostgresService.cs Uses Start+wait helper for server parameter update LRO.
tools/Azure.Mcp.Tools.MySql/src/Services/MySqlService.cs Uses Start+wait helper for configuration LRO; minor token request cleanup.
tools/Azure.Mcp.Tools.Monitor/src/Services/MonitorWebTestService.cs Uses Start+wait helper for web test create/update LROs; minor Uri construction cleanup.
tools/Azure.Mcp.Tools.ManagedLustre/src/Services/ManagedLustreService.cs Uses Start+wait helper for multiple Lustre ARM LROs; minor object init refactors.
tools/Azure.Mcp.Tools.LoadTesting/src/Services/LoadTestingService.cs Refactors some null checks and uses Start+wait helper for resource create; adds several ?? throw guards.
tools/Azure.Mcp.Tools.FileShares/src/Services/FileSharesService.cs Uses Start+wait helper across file share/snapshot/PE connection LROs; minor object init cleanups.
tools/Azure.Mcp.Tools.EventHubs/src/Services/EventHubsService.cs Uses Start+wait helper for namespace/event hub/consumer group LROs.
tools/Azure.Mcp.Tools.ConfidentialLedger/src/Services/ConfidentialLedgerService.cs Uses Start+wait helper for ledger append-entry operation.
tools/Azure.Mcp.Tools.Compute/src/Services/ComputeService.cs Uses Start+wait helper for VM/VMSS/network/disk LROs; minor modern C# updates.
tools/Azure.Mcp.Tools.Communication/src/Services/CommunicationService.cs Uses Start+wait helper for email send LRO; minor object init cleanups.
tools/Azure.Mcp.Tools.AppService/src/Services/AppServiceService.cs Uses Start+wait helper for config update LRO; minor modern C# header/value construction changes.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Helpers/TestEnvironment.cs Namespace/style cleanup (file-scoped namespace).
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/RecordedCommandTestsBase.cs Uses collection expression for sanitizer list; formatting adjustments.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/Helpers/TestProxyFixture.cs File-scoped namespace + cleanup; adds path resolver configurability.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/Helpers/LiveTestSettingsFixture.cs Guards principal setup for playback; refactor/formatting.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/Helpers/LiveTestSettings.cs Reuses shared JsonSerializerOptions for settings deserialization.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/CommandTestsBase.cs Propagates playback env vars (including TEST_MODE) to server process; minor cleanup.
core/Microsoft.Mcp.Core/src/Services/Http/RecordingRedirectHandler.cs In playback, overrides retry-after headers to avoid backoff delays.
core/Microsoft.Mcp.Core/src/Services/Azure/Authentication/CustomChainedCredential.cs Playback detection now uses environment-based IsPlaybackTesting() (debug-only).
core/Microsoft.Mcp.Core/src/Helpers/EnvironmentVariableHelpers.cs Adds IsPlaybackTesting() helper (debug-only, based on TEST_MODE).
core/Azure.Mcp.Core/tests/Azure.Mcp.Core.LiveTests/RecordingFramework/RecordedCommandTestsBaseTests.cs Removes unused usings.
core/Azure.Mcp.Core/tests/Azure.Mcp.Core.LiveTests/RecordingFramework/RecordedCommandTestHarness.cs Minor expression-bodied method cleanups and formatting.
core/Azure.Mcp.Core/src/Services/Azure/BaseAzureService.cs Adds LRO completion helper and playback-specific poll interval shortening.
core/Azure.Mcp.Core/src/Services/Azure/BaseAzureResourceService.cs Makes generic ARM helpers static; uses Start+wait helper for generic resource LROs.

Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean approach to the playback perf problem. The three-pronged fix (strip retry-after headers + 1ms LRO poll + credential guard) hits the right targets. The #if DEBUG guard on poll interval is a nice safety measure.

Three items inline - two minor nits and one missed LRO conversion [Codex].

{
#if DEBUG
// In debug builds, check for the presence of an environment variable to determine if we're in playback testing mode.
var tokenCredentialEnv = Environment.GetEnvironmentVariable("TEST_MODE");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: tokenCredentialEnv reads TEST_MODE, not a credential variable - looks like a copy-paste leftover from the old AZURE_TOKEN_CREDENTIALS check. testMode would be clearer.

Suggested change
var tokenCredentialEnv = Environment.GetEnvironmentVariable("TEST_MODE");
var testMode = Environment.GetEnvironmentVariable("TEST_MODE");

{
if (response.Headers.Remove("Retry-After"))
{
response.Headers.RetryAfter = new RetryConditionHeaderValue(TimeSpan.FromMilliseconds(0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: codebase uses TimeSpan.Zero ~26 times vs TimeSpan.FromMilliseconds ~8 times. TimeSpan.Zero is more idiomatic here.

Suggested change
response.Headers.RetryAfter = new RetryConditionHeaderValue(TimeSpan.FromMilliseconds(0));
response.Headers.RetryAfter = new RetryConditionHeaderValue(TimeSpan.Zero);

context: new() { CancellationToken = cancellationToken })
?? throw new Exception($"Failed to retrieve Azure Load Test Run.");

var loadTestRunOperation = await loadTestRunResponse.WaitForCompletionAsync(cancellationToken);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Codex] This LRO path still calls WaitForCompletionAsync directly instead of going through WaitForLroCompletionAsync. That means the 1ms playback poll interval override doesn't apply here. The other LRO in this file (line 106) was correctly converted.

Suggested change
var loadTestRunOperation = await loadTestRunResponse.WaitForCompletionAsync(cancellationToken);
await WaitForLroCompletionAsync(loadTestRunResponse, cancellationToken);
var loadTestRun = loadTestRunResponse.Value.ToString();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Untriaged

Development

Successfully merging this pull request may close these issues.

3 participants