model_selection.bayesian_search

model_selection.bayesian_search

Bayesian hyperparameter search functions for forecasters using Optuna.

Functions

Name	Description
bayesian_search_forecaster	Bayesian hyperparameter optimization for a Forecaster using Optuna.

bayesian_search_forecaster

model_selection.bayesian_search.bayesian_search_forecaster(
    forecaster,
    y,
    cv,
    search_space,
    metric,
    exog=None,
    n_trials=10,
    random_state=123,
    return_best=True,
    n_jobs='auto',
    verbose=False,
    show_progress=False,
    suppress_warnings=False,
    output_file=None,
    kwargs_create_study=None,
    kwargs_study_optimize=None,
)

Bayesian hyperparameter optimization for a Forecaster using Optuna.

Performs Bayesian hyperparameter search using the Optuna library for a Forecaster object. Validation is done using time series backtesting with the provided cross-validation strategy.

Parameters

Name	Type	Description	Default
forecaster	object	Forecaster model. Can be ForecasterRecursive, ForecasterDirect, or any compatible forecaster class.	required
y	pd.Series	Training time series values. Must be a pandas Series with a datetime or numeric index.	required
cv	`TimeSeriesFold` \| `OneStepAheadFold`	Cross-validation strategy with information needed to split the data into folds. Must be an instance of TimeSeriesFold or OneStepAheadFold.	required
search_space	Callable	Callable function with argument `trial` that returns a dictionary with parameter names (str) as keys and Trial objects from optuna (trial.suggest_float, trial.suggest_int, trial.suggest_categorical) as values. Can optionally include ‘lags’ key to search over different lag configurations.	required
metric	str \| Callable \| list[str \| Callable]	Metric(s) to quantify model goodness of fit. Can be: - str: One of ‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’. - Callable: Function with arguments (y_true, y_pred) or (y_true, y_pred, y_train) that returns a float. - list: List containing multiple strings and/or Callables.	required
exog	pd.Series \| pd.DataFrame \| None	Exogenous variable(s) included as predictors. Must have the same number of observations as `y` and aligned so that y[i] is regressed on exog[i]. Default is None.	`None`
n_trials	int	Number of parameter settings sampled during optimization. Default is 10.	`10`
random_state	int	Seed for sampling reproducibility. When passing a custom sampler in kwargs_create_study, set the seed within the sampler (e.g., {‘sampler’: TPESampler(seed=145)}). Default is 123.	`123`
return_best	bool	If True, refit the forecaster using the best parameters found on the whole dataset at the end. Default is True.	`True`
n_jobs	int \| str	Number of parallel jobs. If -1, uses all cores. If ‘auto’, uses spotforecast.skforecast.utils.select_n_jobs_backtesting to automatically determine the number of jobs. Default is ‘auto’.	`'auto'`
verbose	bool	If True, print number of folds used for cross-validation. Default is False.	`False`
show_progress	bool	Whether to show an Optuna progress bar during optimization. Default is False.	`False`
suppress_warnings	bool	If True, suppress spotforecast warnings during hyperparameter search. Default is False.	`False`
output_file	str \| None	Filename or full path to save results as TSV. If None, results are not saved to file. Default is None.	`None`
kwargs_create_study	dict \| None	Additional keyword arguments passed to optuna.create_study(). If not specified, direction is set to ‘minimize’ and TPESampler(seed=123) is used. Default is {}.	`None`
kwargs_study_optimize	dict \| None	Additional keyword arguments passed to study.optimize(). Default is {}.	`None`

Returns

Name	Type	Description
	tuple[pd.DataFrame, object]	tuple[pd.DataFrame, object]: A tuple containing: - results: DataFrame with columns ‘lags’, ‘params’, metric values, and individual parameter columns. Sorted by the first metric. - best_trial: Best optimization result as an optuna.FrozenTrial object containing the best parameters and metric value.

Raises

Name	Type	Description
	ValueError	If exog length doesn’t match y length when return_best=True.
	TypeError	If cv is not an instance of TimeSeriesFold or OneStepAheadFold.
	ValueError	If metric list contains duplicate metric names.

Examples

import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
from spotforecast2_safe.splitter import TimeSeriesFold
from spotforecast2.model_selection.bayesian_search import bayesian_search_forecaster

rng = np.random.default_rng(0)
y = pd.Series(rng.standard_normal(40), name="y")

forecaster = ForecasterRecursive(estimator=Ridge(), lags=2)
cv = TimeSeriesFold(steps=2, initial_train_size=25, refit=False)

def search_space(trial):
    return {
        "estimator__alpha": trial.suggest_float("estimator__alpha", 0.01, 10.0),
    }

results, best_trial = bayesian_search_forecaster(
    forecaster=forecaster,
    y=y,
    cv=cv,
    search_space=search_space,
    metric="mean_squared_error",
    n_trials=3,
    random_state=0,
    return_best=False,
    verbose=False,
    show_progress=False,
    suppress_warnings=True,
)

print(results.shape)
print(results.columns.tolist())
assert results.shape[0] == 3
assert "mean_squared_error" in results.columns
assert "estimator__alpha" in results.columns

(3, 4)
['lags', 'params', 'mean_squared_error', 'estimator__alpha']