model_selection.random_search
model_selection.random_search
Random search hyperparameter optimization for forecasters.
Functions
| Name | Description |
|---|---|
| random_search_forecaster | Random search over parameter distributions for a Forecaster. |
random_search_forecaster
model_selection.random_search.random_search_forecaster(
forecaster,
y,
cv,
param_distributions,
metric,
exog=None,
lags_grid=None,
n_iter=10,
random_state=123,
return_best=True,
n_jobs='auto',
verbose=False,
show_progress=True,
suppress_warnings=False,
output_file=None,
)Random search over parameter distributions for a Forecaster.
Performs random sampling of parameter settings from distributions for a Forecaster object. Validation is done using time series backtesting with the provided cross-validation strategy. This is more efficient than grid search when exploring large parameter spaces.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| forecaster | object | Forecaster model (ForecasterRecursive or ForecasterDirect). | required |
| y | pd.Series | Training time series. | required |
| cv | TimeSeriesFold | OneStepAheadFold |
Cross-validation strategy (TimeSeriesFold or OneStepAheadFold) with information needed to split the data into folds. | required |
| param_distributions | dict | Dictionary with parameter names (str) as keys and distributions or lists of parameters to try as values. Use scipy.stats distributions for continuous parameters. | required |
| metric | str | Callable | list[str | Callable] | Metric(s) to quantify model goodness of fit. If str: ‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’. If Callable: Function with arguments (y_true, y_pred, y_train) that returns a float. If list: Multiple strings and/or Callables. | required |
| exog | pd.Series | pd.DataFrame | None | Exogenous variable(s) included as predictors. Must have the same number of observations as y and aligned so that y[i] is regressed on exog[i]. Default is None. | None |
| lags_grid | list[int | list[int] | np.ndarray[int] | range[int]] | dict[str, list[int | list[int] | np.ndarray[int] | range[int]]] | None | Lists of lags to try. Can be int, lists, numpy ndarray, or range objects. If dict, keys are used as labels in results DataFrame. Default is None. | None |
| n_iter | int | Number of parameter settings sampled per lags configuration. Trades off runtime vs solution quality. Default is 10. | 10 |
| random_state | int | Seed for random sampling for reproducible output. Default is 123. | 123 |
| return_best | bool | If True, refit the forecaster using best parameters on the whole dataset. Default is True. | True |
| n_jobs | int | str | Number of jobs to run in parallel. If -1, uses all cores. If ‘auto’, uses select_n_jobs_backtesting. Default is ‘auto’. | 'auto' |
| verbose | bool | If True, print number of folds used for cv. Default is False. | False |
| show_progress | bool | Whether to show a progress bar. Default is True. | True |
| suppress_warnings | bool | If True, suppress spotforecast warnings during hyperparameter search. Default is False. | False |
| output_file | str | None | Filename or full path to save results as TSV. If None, results are not saved to file. Default is None. | None |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | Results for each parameter combination with columns: lags (lags | |
| pd.DataFrame | configuration), lags_label (descriptive label), params (parameters | |
| pd.DataFrame | configuration), metric (metric value), and additional columns with | |
| pd.DataFrame | param=value pairs. |
Examples
Basic random search with continuous parameter distributions:
>>> import pandas as pd
>>> import numpy as np
>>> from sklearn.linear_model import Ridge
>>> from scipy.stats import uniform
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2.model_selection import TimeSeriesFold
>>> from spotforecast2.model_selection.random_search import random_search_forecaster
>>>
>>> # Create sample data
>>> np.random.seed(123)
>>> y = pd.Series(np.random.randn(50), name='y')
>>>
>>> # Set up forecaster and cross-validation
>>> forecaster = ForecasterRecursive(estimator=Ridge(), lags=3)
>>> cv = TimeSeriesFold(steps=3, initial_train_size=20, refit=False)
>>>
>>> # Define parameter distributions with scipy.stats
>>> param_distributions = {
... 'estimator__alpha': uniform(0.1, 10.0) # Uniform between 0.1 and 10.1
... }
>>>
>>> # Run random search
>>> results = random_search_forecaster(
... forecaster=forecaster,
... y=y,
... cv=cv,
... param_distributions=param_distributions,
... metric='mean_squared_error',
... n_iter=5,
... random_state=42,
... return_best=False,
... verbose=False,
... show_progress=False
... )
>>>
>>> # Check results
>>> print(results.shape[0])
5
>>> print('estimator__alpha' in results.columns)
True
>>> print('mean_squared_error' in results.columns)
True