model_selection.random_search
Random search hyperparameter optimization for forecasters.
Functions
random_search_forecaster
model_selection.random_search.random_search_forecaster(
forecaster,
y,
cv,
param_distributions,
metric,
exog= None ,
lags_grid= None ,
n_iter= 10 ,
random_state= 123 ,
return_best= True ,
n_jobs= 'auto' ,
verbose= False ,
show_progress= True ,
suppress_warnings= False ,
output_file= None ,
)
Random search over parameter distributions for a Forecaster.
Performs random sampling of parameter settings from distributions for a Forecaster object. Validation is done using time series backtesting with the provided cross-validation strategy. This is more efficient than grid search when exploring large parameter spaces.
Parameters
forecaster
object
Forecaster model (ForecasterRecursive or ForecasterDirect).
required
y
pd .Series
Training time series.
required
cv
TimeSeriesFold | OneStepAheadFold
Cross-validation strategy (TimeSeriesFold or OneStepAheadFold) with information needed to split the data into folds.
required
param_distributions
dict
Dictionary with parameter names (str) as keys and distributions or lists of parameters to try as values. Use scipy.stats distributions for continuous parameters.
required
metric
str | Callable | list [str | Callable ]
Metric(s) to quantify model goodness of fit. If str: ‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’. If Callable: Function with arguments (y_true, y_pred, y_train) that returns a float. If list: Multiple strings and/or Callables.
required
exog
pd .Series | pd .DataFrame | None
Exogenous variable(s) included as predictors. Must have the same number of observations as y and aligned so that y[i] is regressed on exog[i]. Default is None.
None
lags_grid
list [int | list [int ] | np .ndarray [int ] | range [int ]] | dict [str , list [int | list [int ] | np .ndarray [int ] | range [int ]]] | None
Lists of lags to try. Can be int, lists, numpy ndarray, or range objects. If dict, keys are used as labels in results DataFrame. Default is None.
None
n_iter
int
Number of parameter settings sampled per lags configuration. Trades off runtime vs solution quality. Default is 10.
10
random_state
int
Seed for random sampling for reproducible output. Default is 123.
123
return_best
bool
If True, refit the forecaster using best parameters on the whole dataset. Default is True.
True
n_jobs
int | str
Number of jobs to run in parallel. If -1, uses all cores. If ‘auto’, uses select_n_jobs_backtesting. Default is ‘auto’.
'auto'
verbose
bool
If True, print number of folds used for cv. Default is False.
False
show_progress
bool
Whether to show a progress bar. Default is True.
True
suppress_warnings
bool
If True, suppress spotforecast warnings during hyperparameter search. Default is False.
False
output_file
str | None
Filename or full path to save results as TSV. If None, results are not saved to file. Default is None.
None
Returns
pd .DataFrame
Results for each parameter combination with columns: lags (lags
pd .DataFrame
configuration), lags_label (descriptive label), params (parameters
pd .DataFrame
configuration), metric (metric value), and additional columns with
pd .DataFrame
param=value pairs.
Examples
import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from scipy.stats import uniform
from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
from spotforecast2_safe.splitter import TimeSeriesFold
from spotforecast2.model_selection.random_search import random_search_forecaster
rng = np.random.default_rng(1234 )
y = pd.Series(rng.standard_normal(30 ), name= "y" )
forecaster = ForecasterRecursive(estimator= Ridge(), lags= 2 )
cv = TimeSeriesFold(steps= 2 , initial_train_size= 20 , refit= False )
param_distributions = {
"estimator__alpha" : uniform(0.1 , 10.0 ),
}
results = random_search_forecaster(
forecaster= forecaster,
y= y,
cv= cv,
param_distributions= param_distributions,
metric= "mean_squared_error" ,
n_iter= 2 ,
random_state= 1234 ,
return_best= False ,
verbose= False ,
show_progress= False ,
)
print (results.shape)
print (results.columns.tolist())
assert results.shape[0 ] == 2
assert "estimator__alpha" in results.columns
assert "mean_squared_error" in results.columns
Number of models compared: 2. Training models...
(2, 5)
['lags', 'lags_label', 'params', 'mean_squared_error', 'estimator__alpha']