model_selection.utils_common

model_selection.utils_common

Common validation and initialization utilities for model selection.

Classes

Name Description
OneStepAheadValidationWarning Warning used to notify that the one-step-ahead validation is being used.

OneStepAheadValidationWarning

model_selection.utils_common.OneStepAheadValidationWarning(message)

Warning used to notify that the one-step-ahead validation is being used.

Parameters

Name Type Description Default
message str The warning message to be displayed. required

Examples

>>> import warnings
>>> from spotforecast2_safe.model_selection.utils_common import OneStepAheadValidationWarning
>>> warnings.warn(
...     "This is a one-step-ahead validation warning.",
...     OneStepAheadValidationWarning
... )
This is a one-step-ahead validation warning.
You can suppress this warning using: warnings.simplefilter('ignore', category=OneStepAheadValidationWarning)

Functions

Name Description
check_backtesting_input This is a helper function to check most inputs of backtesting functions in
check_one_step_ahead_input This is a helper function to check most inputs of hyperparameter tuning
initialize_lags_grid Initialize lags grid and lags label for model selection.
select_n_jobs_backtesting Select the optimal number of jobs to use in the backtesting process. This

check_backtesting_input

model_selection.utils_common.check_backtesting_input(
    forecaster,
    cv,
    metric,
    add_aggregated_metric=True,
    y=None,
    series=None,
    exog=None,
    interval=None,
    interval_method='bootstrapping',
    alpha=None,
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    return_predictors=False,
    freeze_params=True,
    n_jobs='auto',
    show_progress=True,
    suppress_warnings=False,
)

This is a helper function to check most inputs of backtesting functions in modules model_selection.

Parameters

Name Type Description Default
forecaster object Forecaster model. required
cv object TimeSeriesFold object with the information needed to split the data into folds. required
metric (str, Callable, list) Metric used to quantify the goodness of fit of the model. required
add_aggregated_metric bool If True, the aggregated metrics (average, weighted average and pooling) over all levels are also returned (only multiseries). Defaults to True. True
y (pd.Series, None) Training time series for uni-series forecasters. Defaults to None. None
series (pd.DataFrame, dict, None) Training time series for multi-series forecasters. Defaults to None. None
exog (pd.Series, pd.DataFrame, dict, None) Exogenous variables. Defaults to None. None
interval (float, list, tuple, str, object, None) Specifies whether probabilistic predictions should be estimated and the method to use. Defaults to None. None
interval_method str Technique used to estimate prediction intervals. Options: ‘bootstrapping’, ‘conformal’. Defaults to 'bootstrapping'. 'bootstrapping'
alpha (float, None) The confidence intervals used in ForecasterStats. Defaults to None. None
n_boot int Number of bootstrapping iterations. Defaults to 250. 250
use_in_sample_residuals bool If True, use residuals from training data. Defaults to True. True
use_binned_residuals bool If True, residuals are selected based on predicted values. Defaults to True. True
random_state int Seed for reproducibility. Defaults to 123. 123
return_predictors bool If True, return predictors used for predictions. Defaults to False. False
freeze_params bool If True, freeze model parameters after first fit. Defaults to True. True
n_jobs (int, str) Number of jobs to run in parallel. Defaults to 'auto'. 'auto'
show_progress bool Whether to show a progress bar. Defaults to True. True
suppress_warnings bool If True, suppress warnings. Defaults to False. False

Returns

Name Type Description
None None

Examples

>>> import pandas as pd
>>> from spotforecast2_safe.model_selection.utils_common import check_backtesting_input
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.model_selection import TimeSeriesFold
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> cv = TimeSeriesFold(
...     steps=3,
...     initial_train_size=5,
...     gap=0,
...     refit=False,
...     fixed_train_size=False,
...     allow_incomplete_fold=True
... )
>>> check_backtesting_input(
...     forecaster=forecaster,
...     cv=cv,
...     metric=mean_squared_error,
...     y=y
... )

check_one_step_ahead_input

model_selection.utils_common.check_one_step_ahead_input(
    forecaster,
    cv,
    metric,
    y=None,
    series=None,
    exog=None,
    show_progress=True,
    suppress_warnings=False,
)

This is a helper function to check most inputs of hyperparameter tuning functions in modules model_selection when using a OneStepAheadFold.

Parameters

Name Type Description Default
forecaster object Forecaster model. required
cv object OneStepAheadFold object with the information needed to split the data into folds. required
metric (str, Callable, list) Metric used to quantify the goodness of fit of the model. required
y (pd.Series, None) Training time series for uni-series forecasters. Defaults to None. None
series (pd.DataFrame, dict, None) Training time series for multi-series forecasters. Defaults to None. None
exog (pd.Series, pd.DataFrame, dict, None) Exogenous variables. Defaults to None. None
show_progress bool Whether to show a progress bar. Defaults to True. True
suppress_warnings bool If True, suppress warnings. Defaults to False. False

Returns

Name Type Description
None None

Examples

>>> import pandas as pd
>>> from spotforecast2_safe.model_selection.utils_common import check_one_step_ahead_input
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.model_selection import OneStepAheadFold
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> cv = OneStepAheadFold(
...     initial_train_size=5,
...     return_all_predictions=False
... )
>>> check_one_step_ahead_input(
...     forecaster=forecaster,
...     cv=cv,
...     metric=mean_squared_error,
...     y=y
... )

initialize_lags_grid

model_selection.utils_common.initialize_lags_grid(forecaster, lags_grid=None)

Initialize lags grid and lags label for model selection.

Parameters

Name Type Description Default
forecaster object Forecaster model. required
lags_grid (list, dict, None) Lists of lags to try. If list, each element must be an int, list, np.ndarray, or range. If dict, the keys are used as labels in the results DataFrame, and the values are the lags to try. Defaults to None. None

Returns

Name Type Description
tuple tuple[dict[str, int], str] A tuple containing: - lags_grid (dict): Dictionary with lags configuration for each iteration. - lags_label (str): Label for lags representation in the results object.

Examples

>>> from spotforecast2_safe.model_selection.utils_common import initialize_lags_grid
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from sklearn.linear_model import LinearRegression
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> lags_grid = [2, 4]
>>> lags_grid, lags_label = initialize_lags_grid(forecaster, lags_grid)
>>> print(lags_grid)
{'2': 2, '4': 4}
>>> print(lags_label)
values

select_n_jobs_backtesting

model_selection.utils_common.select_n_jobs_backtesting(forecaster, refit)

Select the optimal number of jobs to use in the backtesting process. This selection is based on heuristics and is not guaranteed to be optimal.

The number of jobs is chosen as follows:

  • If refit is an integer, then n_jobs = 1. This is because parallelization doesn’t work with intermittent refit.
  • If forecaster is ‘ForecasterRecursive’ and estimator is a linear estimator, then n_jobs = 1.
  • If forecaster is ‘ForecasterRecursive’ and estimator is not a linear estimator then n_jobs = cpu_count() - 1.
  • If forecaster is ‘ForecasterDirect’ or ‘ForecasterDirectMultiVariate’ and refit = True, then n_jobs = cpu_count() - 1.
  • If forecaster is ‘ForecasterDirect’ or ‘ForecasterDirectMultiVariate’ and refit = False, then n_jobs = 1.
  • If forecaster is ‘ForecasterRecursiveMultiSeries’, then n_jobs = cpu_count() - 1.
  • If forecaster is ‘ForecasterStats’ or ‘ForecasterEquivalentDate’, then n_jobs = 1.
  • If estimator is a LGBMRegressor(n_jobs=1), then n_jobs = cpu_count() - 1.
  • If estimator is a LGBMRegressor with internal n_jobs != 1, then n_jobs = 1. This is because lightgbm is highly optimized for gradient boosting and parallelizes operations at a very fine-grained level, making additional parallelization unnecessary and potentially harmful due to resource contention.

Parameters

Name Type Description Default
forecaster object Forecaster model. required
refit (bool, int) If the forecaster is refitted during the backtesting process. required

Returns

Name Type Description
int int The number of jobs to run in parallel.

Examples

>>> from spotforecast2_safe.model_selection.utils_common import select_n_jobs_backtesting
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from sklearn.linear_model import LinearRegression
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> select_n_jobs_backtesting(forecaster, refit=True)
1