model_selection.boundary

model_selection.boundary

Search-space boundary management helpers for hyperparameter tuning.

After a SpotOptim (or any optimizer) run, the tuned optimum may press against a search-space boundary — meaning the optimizer wanted to go further but was constrained. These helpers make that visible and actionable.

Motivation: KB entry 2026-06-08-hyperparameter-boundary-management documents an operational case where reg_alpha pinned at its old ceiling (98.9 % of the linear range), inflating L1 regularization and flattening the live forecast. Widening that bound and re-running resolved the issue. The helpers below systematize that diagnostic loop.

Three functions are provided:

report_boundary_positions — logs each dimension’s position and returns a list of flagged names. Decoupled from MultiTask: the caller extracts params (e.g. estimator.get_params()). Primary operational diagnostic.
boundary_report — returns a pd.DataFrame with one row per numeric dimension: position, scale, flag. Companion to suggest_bounds.
suggest_bounds — returns a copy of the search space with flagged bounds widened on the pressed side; pass it straight back to run_task_spotoptim(search_space=...).

Key convention difference — prefix handling:

report_boundary_positions strips the "estimator__" prefix from each search-space key before looking up the value in params. The params dict is expected to come from estimator.get_params() (scikit-learn style), which returns UN-prefixed keys such as "reg_alpha" — not "estimator__reg_alpha".

boundary_report and suggest_bounds look up best_params using the search-space key as-is, including any "estimator__" prefix. The best_params dict is expected to come from a SpotOptim result, which stores FULL search-space keys (e.g. "estimator__reg_alpha").

Mixing conventions (passing get_params()-style keys to boundary_report, or SpotOptim result keys to report_boundary_positions) will silently produce an empty result because no key matches. If boundary_report returns an empty DataFrame for a space with numeric dimensions, a key-convention mismatch is the most likely cause.

Ported from:

report_boundary_positions: bart26k-lecture/scripts/team4_4zones_submit.py (report_boundary_positions function, section @sec-team4-boundary-diagnostics).
boundary_report / suggest_bounds: bart26k-lecture/14_team_4_submission.qmd (cell team4-boundary-helpers, section @sec-team4-boundary-management).

Functions

Name	Description
boundary_report	Tabulate each tuned value’s position inside its search-space bound.
report_boundary_positions	Log where each tuned value sits inside its numeric search-space interval.
suggest_bounds	Return a copy of `search_space` with flagged bounds widened.

boundary_report

model_selection.boundary.boundary_report(
    best_params,
    search_space,
    *,
    warn_frac=0.1,
)

Tabulate each tuned value’s position inside its search-space bound.

Returns a DataFrame sorted by descending position, with one row per numeric dimension. Categorical and boolean-valued dimensions are skipped. flag is one of "> upper", "< lower", or "" (interior).

This function uses the search_space keys as-is (including any "estimator__" prefix) to look up matching entries in best_params. The returned param column strips the "estimator__" prefix for readability.

Ported from bart26k-lecture/14_team_4_submission.qmd, cell team4-boundary-helpers.

Parameters

Name	Type	Description	Default
best_params	Mapping[str, float \| int]	Flat dict of parameter names to values, keyed with the same names as `search_space` (including any `"estimator__"` prefix).	required
search_space	Mapping[str, Any]	Dict mapping parameter names to dimension specs: `(low, high)`, `(low, high, "log10")`, or a list of categories (skipped).	required
warn_frac	float	Fraction of the range (in the dimension’s own scale) defining the “near-boundary” zone. Default is 0.10.	`0.1`

Returns

Name	Type	Description
	pd.DataFrame	DataFrame with columns `param`, `low`, `high`, `value`,
	pd.DataFrame	`scale`, `position`, `flag`, sorted by `position` descending.

Examples

Report on a near-upper-boundary value:

from spotforecast2.model_selection.boundary import boundary_report

best = {
    "estimator__reg_alpha": 9.89,
    "estimator__learning_rate": 0.069,
}
space = {
    "estimator__reg_alpha": (0.001, 10.0),
    "estimator__learning_rate": (0.005, 0.3, "log10"),
}
df = boundary_report(best, space)
print(df.to_string(index=False))
assert "reg_alpha" in df["param"].values
flagged = df[df["flag"] == "> upper"]["param"].tolist()
assert "reg_alpha" in flagged

        param   low  high  value  scale  position    flag
    reg_alpha 0.001  10.0  9.890 linear     0.989 > upper
learning_rate 0.005   0.3  0.069  log10     0.641

report_boundary_positions

model_selection.boundary.report_boundary_positions(
    params,
    search_space,
    *,
    warn_frac=0.1,
    logger=None,
)

Log where each tuned value sits inside its numeric search-space interval.

For each entry in search_space that is a 2- or 3-tuple numeric dimension (low, high) or (low, high, "log10"), the function:

strips the "estimator__" prefix from the key when looking up the corresponding entry in params;
skips non-numeric or boolean values;
computes the position in the dimension’s own scale (log10 for log dims, guarding against val <= 0 or low <= 0);
flags "> upper" when pos > 1 - warn_frac and "< lower" when pos < warn_frac;
logs each dimension at INFO level in a columnar format;
returns the list of flagged strings (e.g. ["reg_alpha > upper"]).

Categorical dimensions (list-valued entries) and unreadable entries are skipped. The function never raises — it is a diagnostic and returns an empty list on any unexpected error.

Parameters

Name	Type	Description	Default
params	Mapping[str, float \| int]	Flat dict of parameter names to numeric values, as returned by `estimator.get_params()` or equivalent. Keys should NOT carry the `"estimator__"` prefix (it is stripped from `search_space` keys, not from `params` keys).	required
search_space	Mapping[str, Any]	Dict mapping search-space keys (potentially with `"estimator__"` prefix) to dimension specs: `(low, high)`, `(low, high, "log10")`, or a list of categories.	required
warn_frac	float	Fraction of the range (in the dimension’s own scale) that defines the “near-boundary” zone at each end. Default is 0.10.	`0.1`
logger	logging.Logger \| None	Logger to use for INFO/WARNING messages. Defaults to the module-level `logging.getLogger(__name__)` logger.	`None`

Returns

Name	Type	Description
	list[str]	List of flagged dimension strings, e.g. ["reg_alpha > upper", \| \| \| [list](`list`)\[[str](`str`)\] \| "learning_rate < lower"]. Empty if all dimensions are interior.

Examples

Interior optimum — no flags returned:

from spotforecast2.model_selection.boundary import report_boundary_positions

params = {"num_leaves": 300, "learning_rate": 0.05}
space = {
    "estimator__num_leaves": (8, 1024),
    "estimator__learning_rate": (0.005, 0.3, "log10"),
}
flagged = report_boundary_positions(params, space)
print("flagged:", flagged)
assert flagged == []

flagged: []

Near-upper-boundary — flag is returned:

from spotforecast2.model_selection.boundary import report_boundary_positions

params = {"reg_alpha": 9.9}
space = {"estimator__reg_alpha": (0.001, 10.0)}
flagged = report_boundary_positions(params, space)
print("flagged:", flagged)
assert flagged == ["reg_alpha > upper"]

boundary check: 1 tuned dim(s) near a bound (reg_alpha > upper) -- consider widening that side and re-running.

flagged: ['reg_alpha > upper']

suggest_bounds

model_selection.boundary.suggest_bounds(
    best_params,
    search_space,
    *,
    warn_frac=0.1,
    widen_factor=10.0,
)

Return a copy of search_space with flagged bounds widened.

Upper-pinned dimensions grow upward (high * widen_factor for float/log, high + (high - low) for integer); lower-pinned dimensions grow downward (low / widen_factor for float/log, max(1, low - (high - low)) for integer). Interior and categorical dimensions are copied unchanged.

Pass the result straight back to run_task_spotoptim(search_space=suggest_bounds(...)).

Ported from bart26k-lecture/14_team_4_submission.qmd, cell team4-boundary-helpers (parameter widen renamed to widen_factor for clarity).

Parameters

Name	Type	Description	Default
best_params	Mapping[str, float \| int]	Flat dict of parameter names to values, keyed with the same names as `search_space` (including any `"estimator__"` prefix).	required
search_space	Mapping[str, Any]	Dict mapping parameter names to dimension specs: `(low, high)`, `(low, high, "log10")`, or a list of categories (returned unchanged).	required
warn_frac	float	Fraction of the range (in the dimension’s own scale) defining the “near-boundary” zone. Default is 0.10.	`0.1`
widen_factor	float	Multiplicative factor for widening float/log bounds. Integer bounds use an additive span instead. Default is 10.0.	`10.0`

Returns

Name	Type	Description
	dict[str, Any]	New search-space dict with the same keys as `search_space` but with
	dict[str, Any]	boundary-pressed bounds extended on the pressed side.

Examples

Upper-pinned float bound is multiplied by widen_factor:

from spotforecast2.model_selection.boundary import suggest_bounds

best = {"estimator__reg_alpha": 9.89}
space = {"estimator__reg_alpha": (0.001, 10.0)}
new_space = suggest_bounds(best, space, widen_factor=10.0)
print(new_space)
assert new_space["estimator__reg_alpha"][1] == 100.0

{'estimator__reg_alpha': (0.001, 100.0)}

Log-scale upper-pinned bound is also multiplied:

from spotforecast2.model_selection.boundary import suggest_bounds

best = {"estimator__reg_alpha": 9.89}
space = {"estimator__reg_alpha": (0.001, 10.0, "log10")}
new_space = suggest_bounds(best, space, widen_factor=10.0)
print(new_space)
assert new_space["estimator__reg_alpha"][1] == 100.0

{'estimator__reg_alpha': (0.001, 100.0, 'log10')}

Integer bound grows additively:

from spotforecast2.model_selection.boundary import suggest_bounds

best = {"estimator__n_estimators": 4950}
space = {"estimator__n_estimators": (100, 5000)}
new_space = suggest_bounds(best, space, widen_factor=10.0)
print(new_space)
assert new_space["estimator__n_estimators"][1] == 5000 + (5000 - 100)

{'estimator__n_estimators': (100, 9900)}

Interior dimension is returned unchanged:

from spotforecast2.model_selection.boundary import suggest_bounds

best = {"estimator__num_leaves": 300}
space = {"estimator__num_leaves": (8, 1024)}
new_space = suggest_bounds(best, space)
assert new_space["estimator__num_leaves"] == (8, 1024)