model_selection.random_search

model_selection.random_search

Random search hyperparameter optimization for forecasters.

Functions

Name	Description
random_search_forecaster	Random search over parameter distributions for a Forecaster.

random_search_forecaster

model_selection.random_search.random_search_forecaster(
    forecaster,
    y,
    cv,
    param_distributions,
    metric,
    exog=None,
    lags_grid=None,
    n_iter=10,
    random_state=123,
    return_best=True,
    n_jobs='auto',
    verbose=False,
    show_progress=True,
    suppress_warnings=False,
    output_file=None,
)

Random search over parameter distributions for a Forecaster.

Performs random sampling of parameter settings from distributions for a Forecaster object. Validation is done using time series backtesting with the provided cross-validation strategy. This is more efficient than grid search when exploring large parameter spaces.

Parameters

Name	Type	Description	Default
forecaster	object	Forecaster model (ForecasterRecursive or ForecasterDirect).	required
y	pd.Series	Training time series.	required
cv	`TimeSeriesFold` \| `OneStepAheadFold`	Cross-validation strategy (TimeSeriesFold or OneStepAheadFold) with information needed to split the data into folds.	required
param_distributions	dict	Dictionary with parameter names (str) as keys and distributions or lists of parameters to try as values. Use scipy.stats distributions for continuous parameters.	required
metric	str \| Callable \| list[str \| Callable]	Metric(s) to quantify model goodness of fit. If str: ‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’. If Callable: Function with arguments (y_true, y_pred, y_train) that returns a float. If list: Multiple strings and/or Callables.	required
exog	pd.Series \| pd.DataFrame \| None	Exogenous variable(s) included as predictors. Must have the same number of observations as y and aligned so that y[i] is regressed on exog[i]. Default is None.	`None`
lags_grid	list[int \| list[int] \| np.ndarray[int] \| range[int]] \| dict[str, list[int \| list[int] \| np.ndarray[int] \| range[int]]] \| None	Lists of lags to try. Can be int, lists, numpy ndarray, or range objects. If dict, keys are used as labels in results DataFrame. Default is None.	`None`
n_iter	int	Number of parameter settings sampled per lags configuration. Trades off runtime vs solution quality. Default is 10.	`10`
random_state	int	Seed for random sampling for reproducible output. Default is 123.	`123`
return_best	bool	If True, refit the forecaster using best parameters on the whole dataset. Default is True.	`True`
n_jobs	int \| str	Number of jobs to run in parallel. If -1, uses all cores. If ‘auto’, uses select_n_jobs_backtesting. Default is ‘auto’.	`'auto'`
verbose	bool	If True, print number of folds used for cv. Default is False.	`False`
show_progress	bool	Whether to show a progress bar. Default is True.	`True`
suppress_warnings	bool	If True, suppress spotforecast warnings during hyperparameter search. Default is False.	`False`
output_file	str \| None	Filename or full path to save results as TSV. If None, results are not saved to file. Default is None.	`None`

Returns

Name	Type	Description
	pd.DataFrame	Results for each parameter combination with columns: lags (lags
	pd.DataFrame	configuration), lags_label (descriptive label), params (parameters
	pd.DataFrame	configuration), metric (metric value), and additional columns with
	pd.DataFrame	param=value pairs.

Examples

import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from scipy.stats import uniform
from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
from spotforecast2_safe.splitter import TimeSeriesFold
from spotforecast2.model_selection.random_search import random_search_forecaster

rng = np.random.default_rng(1234)
y = pd.Series(rng.standard_normal(30), name="y")

forecaster = ForecasterRecursive(estimator=Ridge(), lags=2)
cv = TimeSeriesFold(steps=2, initial_train_size=20, refit=False)

param_distributions = {
    "estimator__alpha": uniform(0.1, 10.0),
}

results = random_search_forecaster(
    forecaster=forecaster,
    y=y,
    cv=cv,
    param_distributions=param_distributions,
    metric="mean_squared_error",
    n_iter=2,
    random_state=1234,
    return_best=False,
    verbose=False,
    show_progress=False,
)

print(results.shape)
print(results.columns.tolist())
assert results.shape[0] == 2
assert "estimator__alpha" in results.columns
assert "mean_squared_error" in results.columns

Number of models compared: 2. Training models...
(2, 5)
['lags', 'lags_label', 'params', 'mean_squared_error', 'estimator__alpha']