Search-space boundary management helpers for hyperparameter tuning.
After a SpotOptim (or any optimizer) run, the tuned optimum may press against a search-space boundary — meaning the optimizer wanted to go further but was constrained. These helpers make that visible and actionable.
Motivation: KB entry 2026-06-08-hyperparameter-boundary-management documents an operational case where reg_alpha pinned at its old ceiling (98.9 % of the linear range), inflating L1 regularization and flattening the live forecast. Widening that bound and re-running resolved the issue. The helpers below systematize that diagnostic loop.
Three functions are provided:
report_boundary_positions — logs each dimension’s position and returns a list of flagged names. Decoupled from MultiTask: the caller extracts params (e.g. estimator.get_params()). Primary operational diagnostic.
boundary_report — returns a pd.DataFrame with one row per numeric dimension: position, scale, flag. Companion to suggest_bounds.
suggest_bounds — returns a copy of the search space with flagged bounds widened on the pressed side; pass it straight back to run_task_spotoptim(search_space=...).
Key convention difference — prefix handling:
report_boundary_positions strips the "estimator__" prefix from each search-space key before looking up the value in params. The params dict is expected to come from estimator.get_params() (scikit-learn style), which returns UN-prefixed keys such as "reg_alpha" — not "estimator__reg_alpha".
boundary_report and suggest_bounds look up best_params using the search-space key as-is, including any "estimator__" prefix. The best_params dict is expected to come from a SpotOptim result, which stores FULL search-space keys (e.g. "estimator__reg_alpha").
Mixing conventions (passing get_params()-style keys to boundary_report, or SpotOptim result keys to report_boundary_positions) will silently produce an empty result because no key matches. If boundary_report returns an empty DataFrame for a space with numeric dimensions, a key-convention mismatch is the most likely cause.
Tabulate each tuned value’s position inside its search-space bound.
Returns a DataFrame sorted by descending position, with one row per numeric dimension. Categorical and boolean-valued dimensions are skipped. flag is one of "> upper", "< lower", or "" (interior).
This function uses the search_space keys as-is (including any "estimator__" prefix) to look up matching entries in best_params. The returned param column strips the "estimator__" prefix for readability.
Ported from bart26k-lecture/14_team_4_submission.qmd, cell team4-boundary-helpers.
Log where each tuned value sits inside its numeric search-space interval.
For each entry in search_space that is a 2- or 3-tuple numeric dimension (low, high) or (low, high, "log10"), the function:
strips the "estimator__" prefix from the key when looking up the corresponding entry in params;
skips non-numeric or boolean values;
computes the position in the dimension’s own scale (log10 for log dims, guarding against val <= 0 or low <= 0);
flags "> upper" when pos > 1 - warn_frac and "< lower" when pos < warn_frac;
logs each dimension at INFO level in a columnar format;
returns the list of flagged strings (e.g. ["reg_alpha > upper"]).
Categorical dimensions (list-valued entries) and unreadable entries are skipped. The function never raises — it is a diagnostic and returns an empty list on any unexpected error.
Flat dict of parameter names to numeric values, as returned by estimator.get_params() or equivalent. Keys should NOT carry the "estimator__" prefix (it is stripped from search_space keys, not from params keys).
List of flagged dimension strings, e.g. ["reg_alpha > upper", | | | [list](`list`)\[[str](`str`)\] | "learning_rate < lower"]. Empty if all dimensions are interior.