model_selection.utils_common

model_selection.utils_common

Common validation and initialization utilities for model selection.

Classes

Name	Description
OneStepAheadValidationWarning	Warning used to notify that the one-step-ahead validation is being used.

OneStepAheadValidationWarning

model_selection.utils_common.OneStepAheadValidationWarning(message)

Warning used to notify that the one-step-ahead validation is being used.

Parameters

Name	Type	Description	Default
message	str	The warning message to be displayed.	required

Examples

>>> import warnings
>>> from spotforecast2_safe.model_selection.utils_common import OneStepAheadValidationWarning
>>> warnings.warn(
...     "This is a one-step-ahead validation warning.",
...     OneStepAheadValidationWarning
... )
This is a one-step-ahead validation warning.
You can suppress this warning using: warnings.simplefilter('ignore', category=OneStepAheadValidationWarning)

Functions

Name	Description
check_backtesting_input	This is a helper function to check most inputs of backtesting functions in
check_one_step_ahead_input	This is a helper function to check most inputs of hyperparameter tuning
initialize_lags_grid	Initialize lags grid and lags label for model selection.
select_n_jobs_backtesting	Select the optimal number of jobs to use in the backtesting process. This

check_backtesting_input

model_selection.utils_common.check_backtesting_input(
    forecaster,
    cv,
    metric,
    add_aggregated_metric=True,
    y=None,
    series=None,
    exog=None,
    interval=None,
    interval_method='bootstrapping',
    alpha=None,
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    return_predictors=False,
    freeze_params=True,
    n_jobs='auto',
    show_progress=True,
    suppress_warnings=False,
)

This is a helper function to check most inputs of backtesting functions in modules model_selection.

Parameters

Name	Type	Description	Default
forecaster	object	Forecaster model.	required
cv	object	TimeSeriesFold object with the information needed to split the data into folds.	required
metric	(str, Callable, list)	Metric used to quantify the goodness of fit of the model.	required
add_aggregated_metric	bool	If `True`, the aggregated metrics (average, weighted average and pooling) over all levels are also returned (only multiseries). Defaults to `True`.	`True`
y	(pd.Series, None)	Training time series for uni-series forecasters. Defaults to `None`.	`None`
series	(pd.DataFrame, dict, None)	Training time series for multi-series forecasters. Defaults to `None`.	`None`
exog	(pd.Series, pd.DataFrame, dict, None)	Exogenous variables. Defaults to `None`.	`None`
interval	(float, list, tuple, str, object, None)	Specifies whether probabilistic predictions should be estimated and the method to use. Defaults to `None`.	`None`
interval_method	str	Technique used to estimate prediction intervals. Options: ‘bootstrapping’, ‘conformal’. Defaults to `'bootstrapping'`.	`'bootstrapping'`
alpha	(float, None)	The confidence intervals used in ForecasterStats. Defaults to `None`.	`None`
n_boot	int	Number of bootstrapping iterations. Defaults to `250`.	`250`
use_in_sample_residuals	bool	If `True`, use residuals from training data. Defaults to `True`.	`True`
use_binned_residuals	bool	If `True`, residuals are selected based on predicted values. Defaults to `True`.	`True`
random_state	int	Seed for reproducibility. Defaults to `123`.	`123`
return_predictors	bool	If `True`, return predictors used for predictions. Defaults to `False`.	`False`
freeze_params	bool	If `True`, freeze model parameters after first fit. Defaults to `True`.	`True`
n_jobs	(int, str)	Number of jobs to run in parallel. Defaults to `'auto'`.	`'auto'`
show_progress	bool	Whether to show a progress bar. Defaults to `True`.	`True`
suppress_warnings	bool	If `True`, suppress warnings. Defaults to `False`.	`False`

Returns

Name	Type	Description
	None	None

Examples

>>> import pandas as pd
>>> from spotforecast2_safe.model_selection.utils_common import check_backtesting_input
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.model_selection import TimeSeriesFold
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> cv = TimeSeriesFold(
...     steps=3,
...     initial_train_size=5,
...     gap=0,
...     refit=False,
...     fixed_train_size=False,
...     allow_incomplete_fold=True
... )
>>> check_backtesting_input(
...     forecaster=forecaster,
...     cv=cv,
...     metric=mean_squared_error,
...     y=y
... )

check_one_step_ahead_input

model_selection.utils_common.check_one_step_ahead_input(
    forecaster,
    cv,
    metric,
    y=None,
    series=None,
    exog=None,
    show_progress=True,
    suppress_warnings=False,
)

This is a helper function to check most inputs of hyperparameter tuning functions in modules model_selection when using a OneStepAheadFold.

Parameters

Name	Type	Description	Default
forecaster	object	Forecaster model.	required
cv	object	OneStepAheadFold object with the information needed to split the data into folds.	required
metric	(str, Callable, list)	Metric used to quantify the goodness of fit of the model.	required
y	(pd.Series, None)	Training time series for uni-series forecasters. Defaults to `None`.	`None`
series	(pd.DataFrame, dict, None)	Training time series for multi-series forecasters. Defaults to `None`.	`None`
exog	(pd.Series, pd.DataFrame, dict, None)	Exogenous variables. Defaults to `None`.	`None`
show_progress	bool	Whether to show a progress bar. Defaults to `True`.	`True`
suppress_warnings	bool	If `True`, suppress warnings. Defaults to `False`.	`False`

Returns

Name	Type	Description
	None	None

Examples

>>> import pandas as pd
>>> from spotforecast2_safe.model_selection.utils_common import check_one_step_ahead_input
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.model_selection import OneStepAheadFold
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> cv = OneStepAheadFold(
...     initial_train_size=5,
...     return_all_predictions=False
... )
>>> check_one_step_ahead_input(
...     forecaster=forecaster,
...     cv=cv,
...     metric=mean_squared_error,
...     y=y
... )

initialize_lags_grid

model_selection.utils_common.initialize_lags_grid(forecaster, lags_grid=None)

Initialize lags grid and lags label for model selection.

Parameters

Name	Type	Description	Default
forecaster	object	Forecaster model.	required
lags_grid	(list, dict, None)	Lists of lags to try. If `list`, each element must be an int, list, np.ndarray, or range. If `dict`, the keys are used as labels in the `results` DataFrame, and the values are the lags to try. Defaults to `None`.	`None`

Returns

Name	Type	Description
tuple	tuple[dict[str, int], str]	A tuple containing: - lags_grid (dict): Dictionary with lags configuration for each iteration. - lags_label (str): Label for lags representation in the results object.

Examples

>>> from spotforecast2_safe.model_selection.utils_common import initialize_lags_grid
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from sklearn.linear_model import LinearRegression
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> lags_grid = [2, 4]
>>> lags_grid, lags_label = initialize_lags_grid(forecaster, lags_grid)
>>> print(lags_grid)
{'2': 2, '4': 4}
>>> print(lags_label)
values

select_n_jobs_backtesting

model_selection.utils_common.select_n_jobs_backtesting(forecaster, refit)

Select the optimal number of jobs to use in the backtesting process. This selection is based on heuristics and is not guaranteed to be optimal.

The number of jobs is chosen as follows:

If refit is an integer, then n_jobs = 1. This is because parallelization doesn’t work with intermittent refit.
If forecaster is ‘ForecasterRecursive’ and estimator is a linear estimator, then n_jobs = 1.
If forecaster is ‘ForecasterRecursive’ and estimator is not a linear estimator then n_jobs = cpu_count() - 1.
If forecaster is ‘ForecasterDirect’ or ‘ForecasterDirectMultiVariate’ and refit = True, then n_jobs = cpu_count() - 1.
If forecaster is ‘ForecasterDirect’ or ‘ForecasterDirectMultiVariate’ and refit = False, then n_jobs = 1.
If forecaster is ‘ForecasterRecursiveMultiSeries’, then n_jobs = cpu_count() - 1.
If forecaster is ‘ForecasterStats’ or ‘ForecasterEquivalentDate’, then n_jobs = 1.
If estimator is a LGBMRegressor(n_jobs=1), then n_jobs = cpu_count() - 1.
If estimator is a LGBMRegressor with internal n_jobs != 1, then n_jobs = 1. This is because lightgbm is highly optimized for gradient boosting and parallelizes operations at a very fine-grained level, making additional parallelization unnecessary and potentially harmful due to resource contention.

Parameters

Name	Type	Description	Default
forecaster	object	Forecaster model.	required
refit	(bool, int)	If the forecaster is refitted during the backtesting process.	required

Returns

Name	Type	Description
int	int	The number of jobs to run in parallel.

Examples

>>> from spotforecast2_safe.model_selection.utils_common import select_n_jobs_backtesting
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from sklearn.linear_model import LinearRegression
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> select_n_jobs_backtesting(forecaster, refit=True)
1