model_selection.utils_common

model_selection.utils_common

Common validation and initialization utilities for model selection.

Functions

Name Description
check_backtesting_input This is a helper function to check most inputs of backtesting functions in
check_one_step_ahead_input This is a helper function to check most inputs of hyperparameter tuning
initialize_lags_grid Initialize lags grid and lags label for model selection.
select_n_jobs_backtesting Select the optimal number of jobs to use in the backtesting process. This

check_backtesting_input

model_selection.utils_common.check_backtesting_input(
    forecaster,
    cv,
    metric,
    add_aggregated_metric=True,
    y=None,
    series=None,
    exog=None,
    interval=None,
    interval_method='bootstrapping',
    alpha=None,
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    return_predictors=False,
    freeze_params=True,
    n_jobs='auto',
    show_progress=True,
    suppress_warnings=False,
)

This is a helper function to check most inputs of backtesting functions in modules model_selection.

Parameters

Name Type Description Default
forecaster object Forecaster model. required
cv object TimeSeriesFold object with the information needed to split the data into folds. required
metric str | Callable | list[str | Callable] Metric used to quantify the goodness of fit of the model. required
add_aggregated_metric bool If True, the aggregated metrics (average, weighted average and pooling) over all levels are also returned (only multiseries). True
y pd.Series | None Training time series for uni-series forecasters. None
series pd.DataFrame | dict[str, pd.Series | pd.DataFrame] Training time series for multi-series forecasters. None
exog pd.Series | pd.DataFrame | dict[str, pd.Series | pd.DataFrame] | None Exogenous variables. None
interval float | list[float] | tuple[float] | str | object | None Specifies whether probabilistic predictions should be estimated and the method to use. The following options are supported: - If float, represents the nominal (expected) coverage (between 0 and 1). For instance, interval=0.95 corresponds to [2.5, 97.5] percentiles. - If list or tuple: Sequence of percentiles to compute, each value must be between 0 and 100 inclusive. For example, a 95% confidence interval can be specified as interval = [2.5, 97.5] or multiple percentiles (e.g. 10, 50 and 90) as interval = [10, 50, 90]. - If ‘bootstrapping’ (str): n_boot bootstrapping predictions will be generated. - If scipy.stats distribution object, the distribution parameters will be estimated for each prediction. - If None, no probabilistic predictions are estimated. None
interval_method str Technique used to estimate prediction intervals. Available options: - ‘bootstrapping’: Bootstrapping is used to generate prediction intervals. - ‘conformal’: Employs the conformal prediction split method for interval estimation. 'bootstrapping'
alpha float | None The confidence intervals used in ForecasterStats are (1 - alpha) %. None
n_boot int Number of bootstrapping iterations to perform when estimating prediction intervals. 250
use_in_sample_residuals bool If True, residuals from the training data are used as proxy of prediction error to create prediction intervals. If False, out_sample_residuals are used if they are already stored inside the forecaster. True
use_binned_residuals bool If True, residuals are selected based on the predicted values (binned selection). If False, residuals are selected randomly. True
random_state int Seed for the random number generator to ensure reproducibility. 123
return_predictors bool If True, the predictors used to make the predictions are also returned. False
n_jobs int | str The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores. If ‘auto’, n_jobs is set using the function select_n_jobs_fit_forecaster. 'auto'
freeze_params bool Determines whether to freeze the model parameters after the first fit for estimators that perform automatic model selection. - If True, the model parameters found during the first fit (e.g., order and seasonal_order for Arima, or smoothing parameters for Ets) are reused in all subsequent refits. This avoids re-running the automatic selection procedure in each fold and reduces runtime. - If False, automatic model selection is performed independently in each refit, allowing parameters to adapt across folds. This increases runtime and adds a params column to the output with the parameters selected per fold. True
show_progress bool Whether to show a progress bar. True
suppress_warnings bool If True, spotforecast warnings will be suppressed during the backtesting process. False

Returns

Name Type Description
None None

Examples

>>> import pandas as pd
>>> from spotforecast2.model_selection.utils_common import check_backtesting_input
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2.model_selection import TimeSeriesFold
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> cv = TimeSeriesFold(
...     steps=3,
...     initial_train_size=5,
...     gap=0,
...     refit=False,
...     fixed_train_size=False,
...     allow_incomplete_fold=True
... )
>>> check_backtesting_input(
...     forecaster=forecaster,
...     cv=cv,
...     metric=mean_squared_error,
...     y=y
... )

check_one_step_ahead_input

model_selection.utils_common.check_one_step_ahead_input(
    forecaster,
    cv,
    metric,
    y=None,
    series=None,
    exog=None,
    show_progress=True,
    suppress_warnings=False,
)

This is a helper function to check most inputs of hyperparameter tuning functions in modules model_selection when using a OneStepAheadFold.

Parameters

Name Type Description Default
forecaster object Forecaster model. required
cv object OneStepAheadFold object with the information needed to split the data into folds. required
metric str | Callable | list[str | Callable] Metric used to quantify the goodness of fit of the model. required
y pd.Series | None Training time series for uni-series forecasters. None
series pd.DataFrame | dict[str, pd.Series | pd.DataFrame] Training time series for multi-series forecasters. None
exog pd.Series | pd.DataFrame | dict[str, pd.Series | pd.DataFrame] | None Exogenous variables. None
show_progress bool Whether to show a progress bar. True
suppress_warnings bool If True, spotforecast warnings will be suppressed during the hyperparameter search. False

Returns

Name Type Description
None None

Examples

>>> import pandas as pd
>>> from spotforecast2.model_selection.utils_common import check_one_step_ahead_input
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2.model_selection import OneStepAheadFold
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> cv = OneStepAheadFold(
...     initial_train_size=5,
...     return_all_predictions=False
... )
>>> check_one_step_ahead_input(
...     forecaster=forecaster,
...     cv=cv,
...     metric=mean_squared_error,
...     y=y
... )

initialize_lags_grid

model_selection.utils_common.initialize_lags_grid(forecaster, lags_grid=None)

Initialize lags grid and lags label for model selection.

Parameters

Name Type Description Default
forecaster object Forecaster model. ForecasterRecursive, ForecasterDirect, ForecasterRecursiveMultiSeries, ForecasterDirectMultiVariate. required
lags_grid list[int | list[int] | np.ndarray[int] | range[int]] | dict[str, list[int | list[int] | np.ndarray[int] | range[int]]] | None Lists of lags to try, containing int, lists, numpy ndarray, or range objects. If dict, the keys are used as labels in the results DataFrame, and the values are used as the lists of lags to try. None

Returns

Name Type Description
tuple tuple[dict[str, int], str] (lags_grid, lags_label) - lags_grid (dict): Dictionary with lags configuration for each iteration. - lags_label (str): Label for lags representation in the results object.

Examples

>>> from spotforecast2.model_selection.utils_common import initialize_lags_grid
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from sklearn.linear_model import LinearRegression
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> lags_grid = [2, 4]
>>> lags_grid, lags_label = initialize_lags_grid(forecaster, lags_grid)
>>> print(lags_grid)
{'2': 2, '4': 4}
>>> print(lags_label)
values

select_n_jobs_backtesting

model_selection.utils_common.select_n_jobs_backtesting(forecaster, refit)

Select the optimal number of jobs to use in the backtesting process. This selection is based on heuristics and is not guaranteed to be optimal.

The number of jobs is chosen as follows:

  • If refit is an integer, then n_jobs = 1. This is because parallelization doesn’t work with intermittent refit.
  • If forecaster is ‘ForecasterRecursive’ and estimator is a linear estimator, then n_jobs = 1.
  • If forecaster is ‘ForecasterRecursive’ and estimator is not a linear estimator then n_jobs = cpu_count() - 1.
  • If forecaster is ‘ForecasterDirect’ or ‘ForecasterDirectMultiVariate’ and refit = True, then n_jobs = cpu_count() - 1.
  • If forecaster is ‘ForecasterDirect’ or ‘ForecasterDirectMultiVariate’ and refit = False, then n_jobs = 1.
  • If forecaster is ‘ForecasterRecursiveMultiSeries’, then n_jobs = cpu_count() - 1.
  • If forecaster is ‘ForecasterStats’ or ‘ForecasterEquivalentDate’, then n_jobs = 1.
  • If estimator is a LGBMRegressor(n_jobs=1), then n_jobs = cpu_count() - 1.
  • If estimator is a LGBMRegressor with internal n_jobs != 1, then n_jobs = 1. This is because lightgbm is highly optimized for gradient boosting and parallelizes operations at a very fine-grained level, making additional parallelization unnecessary and potentially harmful due to resource contention.

Parameters

Name Type Description Default
forecaster object Forecaster model. required
refit bool | int If the forecaster is refitted during the backtesting process. required

Returns

Name Type Description
int int The number of jobs to run in parallel.

Examples

>>> from spotforecast2.model_selection.utils_common import select_n_jobs_backtesting
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from sklearn.linear_model import LinearRegression
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> select_n_jobs_backtesting(forecaster, refit=True)
1