model_selection.bayesian_search

model_selection.bayesian_search

Bayesian hyperparameter search functions for forecasters using Optuna.

Functions

Name Description
bayesian_search_forecaster Bayesian hyperparameter optimization for a Forecaster using Optuna.

bayesian_search_forecaster

model_selection.bayesian_search.bayesian_search_forecaster(
    forecaster,
    y,
    cv,
    search_space,
    metric,
    exog=None,
    n_trials=10,
    random_state=123,
    return_best=True,
    n_jobs='auto',
    verbose=False,
    show_progress=False,
    suppress_warnings=False,
    output_file=None,
    kwargs_create_study=None,
    kwargs_study_optimize=None,
)

Bayesian hyperparameter optimization for a Forecaster using Optuna.

Performs Bayesian hyperparameter search using the Optuna library for a Forecaster object. Validation is done using time series backtesting with the provided cross-validation strategy.

Parameters

Name Type Description Default
forecaster object Forecaster model. Can be ForecasterRecursive, ForecasterDirect, or any compatible forecaster class. required
y pd.Series Training time series values. Must be a pandas Series with a datetime or numeric index. required
cv TimeSeriesFold | OneStepAheadFold Cross-validation strategy with information needed to split the data into folds. Must be an instance of TimeSeriesFold or OneStepAheadFold. required
search_space Callable Callable function with argument trial that returns a dictionary with parameter names (str) as keys and Trial objects from optuna (trial.suggest_float, trial.suggest_int, trial.suggest_categorical) as values. Can optionally include ‘lags’ key to search over different lag configurations. required
metric str | Callable | list[str | Callable] Metric(s) to quantify model goodness of fit. Can be: - str: One of ‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’. - Callable: Function with arguments (y_true, y_pred) or (y_true, y_pred, y_train) that returns a float. - list: List containing multiple strings and/or Callables. required
exog pd.Series | pd.DataFrame | None Exogenous variable(s) included as predictors. Must have the same number of observations as y and aligned so that y[i] is regressed on exog[i]. Default is None. None
n_trials int Number of parameter settings sampled during optimization. Default is 10. 10
random_state int Seed for sampling reproducibility. When passing a custom sampler in kwargs_create_study, set the seed within the sampler (e.g., {‘sampler’: TPESampler(seed=145)}). Default is 123. 123
return_best bool If True, refit the forecaster using the best parameters found on the whole dataset at the end. Default is True. True
n_jobs int | str Number of parallel jobs. If -1, uses all cores. If ‘auto’, uses spotforecast.skforecast.utils.select_n_jobs_backtesting to automatically determine the number of jobs. Default is ‘auto’. 'auto'
verbose bool If True, print number of folds used for cross-validation. Default is False. False
show_progress bool Whether to show an Optuna progress bar during optimization. Default is False. False
suppress_warnings bool If True, suppress spotforecast warnings during hyperparameter search. Default is False. False
output_file str | None Filename or full path to save results as TSV. If None, results are not saved to file. Default is None. None
kwargs_create_study dict | None Additional keyword arguments passed to optuna.create_study(). If not specified, direction is set to ‘minimize’ and TPESampler(seed=123) is used. Default is {}. None
kwargs_study_optimize dict | None Additional keyword arguments passed to study.optimize(). Default is {}. None

Returns

Name Type Description
tuple[pd.DataFrame, object] tuple[pd.DataFrame, object]: A tuple containing: - results: DataFrame with columns ‘lags’, ‘params’, metric values, and individual parameter columns. Sorted by the first metric. - best_trial: Best optimization result as an optuna.FrozenTrial object containing the best parameters and metric value.

Raises

Name Type Description
ValueError If exog length doesn’t match y length when return_best=True.
TypeError If cv is not an instance of TimeSeriesFold or OneStepAheadFold.
ValueError If metric list contains duplicate metric names.

Examples

import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
from spotforecast2_safe.splitter import TimeSeriesFold
from spotforecast2.model_selection.bayesian_search import bayesian_search_forecaster

rng = np.random.default_rng(0)
y = pd.Series(rng.standard_normal(40), name="y")

forecaster = ForecasterRecursive(estimator=Ridge(), lags=2)
cv = TimeSeriesFold(steps=2, initial_train_size=25, refit=False)

def search_space(trial):
    return {
        "estimator__alpha": trial.suggest_float("estimator__alpha", 0.01, 10.0),
    }

results, best_trial = bayesian_search_forecaster(
    forecaster=forecaster,
    y=y,
    cv=cv,
    search_space=search_space,
    metric="mean_squared_error",
    n_trials=3,
    random_state=0,
    return_best=False,
    verbose=False,
    show_progress=False,
    suppress_warnings=True,
)

print(results.shape)
print(results.columns.tolist())
assert results.shape[0] == 3
assert "mean_squared_error" in results.columns
assert "estimator__alpha" in results.columns
(3, 4)
['lags', 'params', 'mean_squared_error', 'estimator__alpha']