Remove dead `recoverable_exceptions` and `is_recoverable_fetch_e` by saitcakmak · Pull Request #5120 · facebook/Ax

saitcakmak · 2026-04-01T17:40:04Z

Summary:
Follow-up to D98924467, which decoupled metric fetch errors from trial
status in the Orchestrator. The orchestrator no longer uses
recoverable_exceptions or is_recoverable_fetch_e to decide trial
fate, making them dead code.

Remove Metric.recoverable_exceptions class attribute and
Metric.is_recoverable_fetch_e classmethod from ax/core/metric.py.

Differential Revision: D98932195

…ook#5119) Summary: Design doc: D98741656 When `fetch_trials_data_results` returned a `MetricFetchE` for an optimization config metric, the orchestrator marked the trial as ABANDONED. This discarded good data, inflated the failure rate, and was inconsistent with the Client layer which keeps trials COMPLETED with incomplete metrics via `MetricAvailability` (D93924193). This diff removes the trial abandonment behavior. Metric fetch errors are now logged (with traceback via `logger.exception`) but trial status is unchanged. `MetricAvailability` tracks data completeness, and the failure rate check uses it to detect persistent metric issues. Changes: - `_fetch_and_process_trials_data_results`: Removed the branch that marked trials ABANDONED for metric fetch errors and the separate `is_available_while_running` branch. All metric fetch errors are now simply logged and the method continues. The `_report_metric_fetch_e` hook is still called so subclasses (e.g. `AxSweepOrchestrator`) can react to errors (create pastes, build error tables, etc.). - `error_if_failure_rate_exceeded`: Merged `_check_if_failure_rate_exceeded` into this method to avoid duplicate computation. Now counts both runner failures (FAILED/ABANDONED) and metric-incomplete trials (via `compute_metric_availability`) toward the failure rate. - `_get_failure_rate_exceeded_error`: Rewritten with an actionable error message listing runner failures, metric-incomplete trials, missing metrics, and affected trial indices. - Removed dead code: `_mark_err_trial_status`, `_num_trials_bad_due_to_err`, `_num_metric_fetch_e_encountered`, `_check_if_failure_rate_exceeded`, `METRIC_FETCH_ERR_MESSAGE`. - Kept `_report_metric_fetch_e` as a no-op hook so subclasses like `AxSweepOrchestrator` can still react to metric fetch errors. - Updated telemetry (`OrchestratorCompletedRecord`) to use `_count_metric_incomplete_trials` (via `compute_metric_availability`) for both `num_metric_fetch_e_encountered` and `num_trials_bad_due_to_err`. - Updated `AxSweepOrchestrator` test assertions: trials now stay COMPLETED (not ABANDONED) after metric fetch errors. - `Metric.recoverable_exceptions` and `Metric.is_recoverable_fetch_e` are kept for now since `pts/` metrics still reference them; cleanup will follow in a separate diff. Differential Revision: D98924467

Summary: Follow-up to D98924467, which decoupled metric fetch errors from trial status in the Orchestrator. The orchestrator no longer uses `recoverable_exceptions` or `is_recoverable_fetch_e` to decide trial fate, making them dead code. - Remove `Metric.recoverable_exceptions` class attribute and `Metric.is_recoverable_fetch_e` classmethod from `ax/core/metric.py`. Differential Revision: D98932195

meta-codesync · 2026-04-01T17:40:17Z

@saitcakmak has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98932195.

codecov-commenter · 2026-04-01T18:18:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.39%. Comparing base (6cebd1c) to head (982d09e).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5120      +/-   ##
==========================================
- Coverage   96.40%   96.39%   -0.02%     
==========================================
  Files         613      613              
  Lines       68142    68140       -2     
==========================================
- Hits        65694    65683      -11     
- Misses       2448     2457       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

saitcakmak added 2 commits April 1, 2026 10:39

meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Apr 1, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove dead `recoverable_exceptions` and `is_recoverable_fetch_e`#5120

Remove dead `recoverable_exceptions` and `is_recoverable_fetch_e`#5120
saitcakmak wants to merge 2 commits intofacebook:mainfrom
saitcakmak:export-D98932195

saitcakmak commented Apr 1, 2026

Uh oh!

meta-codesync bot commented Apr 1, 2026

Uh oh!

codecov-commenter commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saitcakmak commented Apr 1, 2026

Uh oh!

meta-codesync bot commented Apr 1, 2026

Uh oh!

codecov-commenter commented Apr 1, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants