model_selection.random_search

model_selection.random_search

Random search hyperparameter optimization for forecasters.

Functions

Name Description
random_search_forecaster Random search over parameter distributions for a Forecaster.

random_search_forecaster

model_selection.random_search.random_search_forecaster(
    forecaster,
    y,
    cv,
    param_distributions,
    metric,
    exog=None,
    lags_grid=None,
    n_iter=10,
    random_state=123,
    return_best=True,
    n_jobs='auto',
    verbose=False,
    show_progress=True,
    suppress_warnings=False,
    output_file=None,
)

Random search over parameter distributions for a Forecaster.

Performs random sampling of parameter settings from distributions for a Forecaster object. Validation is done using time series backtesting with the provided cross-validation strategy. This is more efficient than grid search when exploring large parameter spaces.

Parameters

Name Type Description Default
forecaster object Forecaster model (ForecasterRecursive or ForecasterDirect). required
y pd.Series Training time series. required
cv TimeSeriesFold | OneStepAheadFold Cross-validation strategy (TimeSeriesFold or OneStepAheadFold) with information needed to split the data into folds. required
param_distributions dict Dictionary with parameter names (str) as keys and distributions or lists of parameters to try as values. Use scipy.stats distributions for continuous parameters. required
metric str | Callable | list[str | Callable] Metric(s) to quantify model goodness of fit. If str: ‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’. If Callable: Function with arguments (y_true, y_pred, y_train) that returns a float. If list: Multiple strings and/or Callables. required
exog pd.Series | pd.DataFrame | None Exogenous variable(s) included as predictors. Must have the same number of observations as y and aligned so that y[i] is regressed on exog[i]. Default is None. None
lags_grid list[int | list[int] | np.ndarray[int] | range[int]] | dict[str, list[int | list[int] | np.ndarray[int] | range[int]]] | None Lists of lags to try. Can be int, lists, numpy ndarray, or range objects. If dict, keys are used as labels in results DataFrame. Default is None. None
n_iter int Number of parameter settings sampled per lags configuration. Trades off runtime vs solution quality. Default is 10. 10
random_state int Seed for random sampling for reproducible output. Default is 123. 123
return_best bool If True, refit the forecaster using best parameters on the whole dataset. Default is True. True
n_jobs int | str Number of jobs to run in parallel. If -1, uses all cores. If ‘auto’, uses select_n_jobs_backtesting. Default is ‘auto’. 'auto'
verbose bool If True, print number of folds used for cv. Default is False. False
show_progress bool Whether to show a progress bar. Default is True. True
suppress_warnings bool If True, suppress spotforecast warnings during hyperparameter search. Default is False. False
output_file str | None Filename or full path to save results as TSV. If None, results are not saved to file. Default is None. None

Returns

Name Type Description
pd.DataFrame Results for each parameter combination with columns: lags (lags
pd.DataFrame configuration), lags_label (descriptive label), params (parameters
pd.DataFrame configuration), metric (metric value), and additional columns with
pd.DataFrame param=value pairs.

Examples

import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from scipy.stats import uniform
from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
from spotforecast2_safe.splitter import TimeSeriesFold
from spotforecast2.model_selection.random_search import random_search_forecaster

rng = np.random.default_rng(1234)
y = pd.Series(rng.standard_normal(30), name="y")

forecaster = ForecasterRecursive(estimator=Ridge(), lags=2)
cv = TimeSeriesFold(steps=2, initial_train_size=20, refit=False)

param_distributions = {
    "estimator__alpha": uniform(0.1, 10.0),
}

results = random_search_forecaster(
    forecaster=forecaster,
    y=y,
    cv=cv,
    param_distributions=param_distributions,
    metric="mean_squared_error",
    n_iter=2,
    random_state=1234,
    return_best=False,
    verbose=False,
    show_progress=False,
)

print(results.shape)
print(results.columns.tolist())
assert results.shape[0] == 2
assert "estimator__alpha" in results.columns
assert "mean_squared_error" in results.columns
Number of models compared: 2. Training models...
(2, 5)
['lags', 'lags_label', 'params', 'mean_squared_error', 'estimator__alpha']