model_selection.boundary

model_selection.boundary

Search-space boundary management helpers for hyperparameter tuning.

After a SpotOptim (or any optimizer) run, the tuned optimum may press against a search-space boundary — meaning the optimizer wanted to go further but was constrained. These helpers make that visible and actionable.

Motivation: KB entry 2026-06-08-hyperparameter-boundary-management documents an operational case where reg_alpha pinned at its old ceiling (98.9 % of the linear range), inflating L1 regularization and flattening the live forecast. Widening that bound and re-running resolved the issue. The helpers below systematize that diagnostic loop.

Three functions are provided:

Key convention difference — prefix handling:

report_boundary_positions strips the "estimator__" prefix from each search-space key before looking up the value in params. The params dict is expected to come from estimator.get_params() (scikit-learn style), which returns UN-prefixed keys such as "reg_alpha" — not "estimator__reg_alpha".

boundary_report and suggest_bounds look up best_params using the search-space key as-is, including any "estimator__" prefix. The best_params dict is expected to come from a SpotOptim result, which stores FULL search-space keys (e.g. "estimator__reg_alpha").

Mixing conventions (passing get_params()-style keys to boundary_report, or SpotOptim result keys to report_boundary_positions) will silently produce an empty result because no key matches. If boundary_report returns an empty DataFrame for a space with numeric dimensions, a key-convention mismatch is the most likely cause.

Ported from:

Functions

Name Description
boundary_report Tabulate each tuned value’s position inside its search-space bound.
report_boundary_positions Log where each tuned value sits inside its numeric search-space interval.
suggest_bounds Return a copy of search_space with flagged bounds widened.

boundary_report

model_selection.boundary.boundary_report(
    best_params,
    search_space,
    *,
    warn_frac=0.1,
)

Tabulate each tuned value’s position inside its search-space bound.

Returns a DataFrame sorted by descending position, with one row per numeric dimension. Categorical and boolean-valued dimensions are skipped. flag is one of "> upper", "< lower", or "" (interior).

This function uses the search_space keys as-is (including any "estimator__" prefix) to look up matching entries in best_params. The returned param column strips the "estimator__" prefix for readability.

Ported from bart26k-lecture/14_team_4_submission.qmd, cell team4-boundary-helpers.

Parameters

Name Type Description Default
best_params Mapping[str, float | int] Flat dict of parameter names to values, keyed with the same names as search_space (including any "estimator__" prefix). required
search_space Mapping[str, Any] Dict mapping parameter names to dimension specs: (low, high), (low, high, "log10"), or a list of categories (skipped). required
warn_frac float Fraction of the range (in the dimension’s own scale) defining the “near-boundary” zone. Default is 0.10. 0.1

Returns

Name Type Description
pd.DataFrame DataFrame with columns param, low, high, value,
pd.DataFrame scale, position, flag, sorted by position descending.

Examples

Report on a near-upper-boundary value:

from spotforecast2.model_selection.boundary import boundary_report

best = {
    "estimator__reg_alpha": 9.89,
    "estimator__learning_rate": 0.069,
}
space = {
    "estimator__reg_alpha": (0.001, 10.0),
    "estimator__learning_rate": (0.005, 0.3, "log10"),
}
df = boundary_report(best, space)
print(df.to_string(index=False))
assert "reg_alpha" in df["param"].values
flagged = df[df["flag"] == "> upper"]["param"].tolist()
assert "reg_alpha" in flagged
        param   low  high  value  scale  position    flag
    reg_alpha 0.001  10.0  9.890 linear     0.989 > upper
learning_rate 0.005   0.3  0.069  log10     0.641        

report_boundary_positions

model_selection.boundary.report_boundary_positions(
    params,
    search_space,
    *,
    warn_frac=0.1,
    logger=None,
)

Log where each tuned value sits inside its numeric search-space interval.

For each entry in search_space that is a 2- or 3-tuple numeric dimension (low, high) or (low, high, "log10"), the function:

  • strips the "estimator__" prefix from the key when looking up the corresponding entry in params;
  • skips non-numeric or boolean values;
  • computes the position in the dimension’s own scale (log10 for log dims, guarding against val <= 0 or low <= 0);
  • flags "> upper" when pos > 1 - warn_frac and "< lower" when pos < warn_frac;
  • logs each dimension at INFO level in a columnar format;
  • returns the list of flagged strings (e.g. ["reg_alpha > upper"]).

Categorical dimensions (list-valued entries) and unreadable entries are skipped. The function never raises — it is a diagnostic and returns an empty list on any unexpected error.

Parameters

Name Type Description Default
params Mapping[str, float | int] Flat dict of parameter names to numeric values, as returned by estimator.get_params() or equivalent. Keys should NOT carry the "estimator__" prefix (it is stripped from search_space keys, not from params keys). required
search_space Mapping[str, Any] Dict mapping search-space keys (potentially with "estimator__" prefix) to dimension specs: (low, high), (low, high, "log10"), or a list of categories. required
warn_frac float Fraction of the range (in the dimension’s own scale) that defines the “near-boundary” zone at each end. Default is 0.10. 0.1
logger logging.Logger | None Logger to use for INFO/WARNING messages. Defaults to the module-level logging.getLogger(__name__) logger. None

Returns

Name Type Description
list[str] List of flagged dimension strings, e.g. ["reg_alpha > upper", | | | [list](`list`)\[[str](`str`)\] | "learning_rate < lower"]. Empty if all dimensions are interior.

Examples

Interior optimum — no flags returned:

from spotforecast2.model_selection.boundary import report_boundary_positions

params = {"num_leaves": 300, "learning_rate": 0.05}
space = {
    "estimator__num_leaves": (8, 1024),
    "estimator__learning_rate": (0.005, 0.3, "log10"),
}
flagged = report_boundary_positions(params, space)
print("flagged:", flagged)
assert flagged == []
flagged: []

Near-upper-boundary — flag is returned:

from spotforecast2.model_selection.boundary import report_boundary_positions

params = {"reg_alpha": 9.9}
space = {"estimator__reg_alpha": (0.001, 10.0)}
flagged = report_boundary_positions(params, space)
print("flagged:", flagged)
assert flagged == ["reg_alpha > upper"]
boundary check: 1 tuned dim(s) near a bound (reg_alpha > upper) -- consider widening that side and re-running.
flagged: ['reg_alpha > upper']

suggest_bounds

model_selection.boundary.suggest_bounds(
    best_params,
    search_space,
    *,
    warn_frac=0.1,
    widen_factor=10.0,
)

Return a copy of search_space with flagged bounds widened.

Upper-pinned dimensions grow upward (high * widen_factor for float/log, high + (high - low) for integer); lower-pinned dimensions grow downward (low / widen_factor for float/log, max(1, low - (high - low)) for integer). Interior and categorical dimensions are copied unchanged.

Pass the result straight back to run_task_spotoptim(search_space=suggest_bounds(...)).

Ported from bart26k-lecture/14_team_4_submission.qmd, cell team4-boundary-helpers (parameter widen renamed to widen_factor for clarity).

Parameters

Name Type Description Default
best_params Mapping[str, float | int] Flat dict of parameter names to values, keyed with the same names as search_space (including any "estimator__" prefix). required
search_space Mapping[str, Any] Dict mapping parameter names to dimension specs: (low, high), (low, high, "log10"), or a list of categories (returned unchanged). required
warn_frac float Fraction of the range (in the dimension’s own scale) defining the “near-boundary” zone. Default is 0.10. 0.1
widen_factor float Multiplicative factor for widening float/log bounds. Integer bounds use an additive span instead. Default is 10.0. 10.0

Returns

Name Type Description
dict[str, Any] New search-space dict with the same keys as search_space but with
dict[str, Any] boundary-pressed bounds extended on the pressed side.

Examples

Upper-pinned float bound is multiplied by widen_factor:

from spotforecast2.model_selection.boundary import suggest_bounds

best = {"estimator__reg_alpha": 9.89}
space = {"estimator__reg_alpha": (0.001, 10.0)}
new_space = suggest_bounds(best, space, widen_factor=10.0)
print(new_space)
assert new_space["estimator__reg_alpha"][1] == 100.0
{'estimator__reg_alpha': (0.001, 100.0)}

Log-scale upper-pinned bound is also multiplied:

from spotforecast2.model_selection.boundary import suggest_bounds

best = {"estimator__reg_alpha": 9.89}
space = {"estimator__reg_alpha": (0.001, 10.0, "log10")}
new_space = suggest_bounds(best, space, widen_factor=10.0)
print(new_space)
assert new_space["estimator__reg_alpha"][1] == 100.0
{'estimator__reg_alpha': (0.001, 100.0, 'log10')}

Integer bound grows additively:

from spotforecast2.model_selection.boundary import suggest_bounds

best = {"estimator__n_estimators": 4950}
space = {"estimator__n_estimators": (100, 5000)}
new_space = suggest_bounds(best, space, widen_factor=10.0)
print(new_space)
assert new_space["estimator__n_estimators"][1] == 5000 + (5000 - 100)
{'estimator__n_estimators': (100, 9900)}

Interior dimension is returned unchanged:

from spotforecast2.model_selection.boundary import suggest_bounds

best = {"estimator__num_leaves": 300}
space = {"estimator__num_leaves": (8, 1024)}
new_space = suggest_bounds(best, space)
assert new_space["estimator__num_leaves"] == (8, 1024)