model_selection.utils_common
model_selection.utils_common
Common validation and initialization utilities for model selection.
Classes
| Name | Description |
|---|---|
| OneStepAheadValidationWarning | Warning used to notify that the one-step-ahead validation is being used. |
OneStepAheadValidationWarning
model_selection.utils_common.OneStepAheadValidationWarning(message)Warning used to notify that the one-step-ahead validation is being used.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| message | str | The warning message to be displayed. | required |
Examples
>>> import warnings
>>> from spotforecast2_safe.model_selection.utils_common import OneStepAheadValidationWarning
>>> warnings.warn(
... "This is a one-step-ahead validation warning.",
... OneStepAheadValidationWarning
... )
This is a one-step-ahead validation warning.
You can suppress this warning using: warnings.simplefilter('ignore', category=OneStepAheadValidationWarning)Functions
| Name | Description |
|---|---|
| check_backtesting_input | This is a helper function to check most inputs of backtesting functions in |
| check_one_step_ahead_input | This is a helper function to check most inputs of hyperparameter tuning |
| initialize_lags_grid | Initialize lags grid and lags label for model selection. |
| select_n_jobs_backtesting | Select the optimal number of jobs to use in the backtesting process. This |
check_backtesting_input
model_selection.utils_common.check_backtesting_input(
forecaster,
cv,
metric,
add_aggregated_metric=True,
y=None,
series=None,
exog=None,
interval=None,
interval_method='bootstrapping',
alpha=None,
n_boot=250,
use_in_sample_residuals=True,
use_binned_residuals=True,
random_state=123,
return_predictors=False,
freeze_params=True,
n_jobs='auto',
show_progress=True,
suppress_warnings=False,
)This is a helper function to check most inputs of backtesting functions in modules model_selection.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| forecaster | object | Forecaster model. | required |
| cv | object | TimeSeriesFold object with the information needed to split the data into folds. | required |
| metric | (str, Callable, list) | Metric used to quantify the goodness of fit of the model. | required |
| add_aggregated_metric | bool | If True, the aggregated metrics (average, weighted average and pooling) over all levels are also returned (only multiseries). Defaults to True. |
True |
| y | (pd.Series, None) | Training time series for uni-series forecasters. Defaults to None. |
None |
| series | (pd.DataFrame, dict, None) | Training time series for multi-series forecasters. Defaults to None. |
None |
| exog | (pd.Series, pd.DataFrame, dict, None) | Exogenous variables. Defaults to None. |
None |
| interval | (float, list, tuple, str, object, None) | Specifies whether probabilistic predictions should be estimated and the method to use. Defaults to None. |
None |
| interval_method | str | Technique used to estimate prediction intervals. Options: ‘bootstrapping’, ‘conformal’. Defaults to 'bootstrapping'. |
'bootstrapping' |
| alpha | (float, None) | The confidence intervals used in ForecasterStats. Defaults to None. |
None |
| n_boot | int | Number of bootstrapping iterations. Defaults to 250. |
250 |
| use_in_sample_residuals | bool | If True, use residuals from training data. Defaults to True. |
True |
| use_binned_residuals | bool | If True, residuals are selected based on predicted values. Defaults to True. |
True |
| random_state | int | Seed for reproducibility. Defaults to 123. |
123 |
| return_predictors | bool | If True, return predictors used for predictions. Defaults to False. |
False |
| freeze_params | bool | If True, freeze model parameters after first fit. Defaults to True. |
True |
| n_jobs | (int, str) | Number of jobs to run in parallel. Defaults to 'auto'. |
'auto' |
| show_progress | bool | Whether to show a progress bar. Defaults to True. |
True |
| suppress_warnings | bool | If True, suppress warnings. Defaults to False. |
False |
Returns
| Name | Type | Description |
|---|---|---|
| None | None |
Examples
>>> import pandas as pd
>>> from spotforecast2_safe.model_selection.utils_common import check_backtesting_input
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.model_selection import TimeSeriesFold
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> cv = TimeSeriesFold(
... steps=3,
... initial_train_size=5,
... gap=0,
... refit=False,
... fixed_train_size=False,
... allow_incomplete_fold=True
... )
>>> check_backtesting_input(
... forecaster=forecaster,
... cv=cv,
... metric=mean_squared_error,
... y=y
... )check_one_step_ahead_input
model_selection.utils_common.check_one_step_ahead_input(
forecaster,
cv,
metric,
y=None,
series=None,
exog=None,
show_progress=True,
suppress_warnings=False,
)This is a helper function to check most inputs of hyperparameter tuning functions in modules model_selection when using a OneStepAheadFold.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| forecaster | object | Forecaster model. | required |
| cv | object | OneStepAheadFold object with the information needed to split the data into folds. | required |
| metric | (str, Callable, list) | Metric used to quantify the goodness of fit of the model. | required |
| y | (pd.Series, None) | Training time series for uni-series forecasters. Defaults to None. |
None |
| series | (pd.DataFrame, dict, None) | Training time series for multi-series forecasters. Defaults to None. |
None |
| exog | (pd.Series, pd.DataFrame, dict, None) | Exogenous variables. Defaults to None. |
None |
| show_progress | bool | Whether to show a progress bar. Defaults to True. |
True |
| suppress_warnings | bool | If True, suppress warnings. Defaults to False. |
False |
Returns
| Name | Type | Description |
|---|---|---|
| None | None |
Examples
>>> import pandas as pd
>>> from spotforecast2_safe.model_selection.utils_common import check_one_step_ahead_input
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.model_selection import OneStepAheadFold
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.metrics import mean_squared_error
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> cv = OneStepAheadFold(
... initial_train_size=5,
... return_all_predictions=False
... )
>>> check_one_step_ahead_input(
... forecaster=forecaster,
... cv=cv,
... metric=mean_squared_error,
... y=y
... )initialize_lags_grid
model_selection.utils_common.initialize_lags_grid(forecaster, lags_grid=None)Initialize lags grid and lags label for model selection.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| forecaster | object | Forecaster model. | required |
| lags_grid | (list, dict, None) | Lists of lags to try. If list, each element must be an int, list, np.ndarray, or range. If dict, the keys are used as labels in the results DataFrame, and the values are the lags to try. Defaults to None. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| tuple | tuple[dict[str, int], str] | A tuple containing: - lags_grid (dict): Dictionary with lags configuration for each iteration. - lags_label (str): Label for lags representation in the results object. |
Examples
>>> from spotforecast2_safe.model_selection.utils_common import initialize_lags_grid
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from sklearn.linear_model import LinearRegression
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> lags_grid = [2, 4]
>>> lags_grid, lags_label = initialize_lags_grid(forecaster, lags_grid)
>>> print(lags_grid)
{'2': 2, '4': 4}
>>> print(lags_label)
valuesselect_n_jobs_backtesting
model_selection.utils_common.select_n_jobs_backtesting(forecaster, refit)Select the optimal number of jobs to use in the backtesting process. This selection is based on heuristics and is not guaranteed to be optimal.
The number of jobs is chosen as follows:
- If
refitis an integer, thenn_jobs = 1. This is because parallelization doesn’t work with intermittent refit. - If forecaster is ‘ForecasterRecursive’ and estimator is a linear estimator, then
n_jobs = 1. - If forecaster is ‘ForecasterRecursive’ and estimator is not a linear estimator then
n_jobs = cpu_count() - 1. - If forecaster is ‘ForecasterDirect’ or ‘ForecasterDirectMultiVariate’ and
refit = True, thenn_jobs = cpu_count() - 1. - If forecaster is ‘ForecasterDirect’ or ‘ForecasterDirectMultiVariate’ and
refit = False, thenn_jobs = 1. - If forecaster is ‘ForecasterRecursiveMultiSeries’, then
n_jobs = cpu_count() - 1. - If forecaster is ‘ForecasterStats’ or ‘ForecasterEquivalentDate’, then
n_jobs = 1. - If estimator is a
LGBMRegressor(n_jobs=1), thenn_jobs = cpu_count() - 1. - If estimator is a
LGBMRegressorwith internal n_jobs != 1, thenn_jobs = 1. This is becauselightgbmis highly optimized for gradient boosting and parallelizes operations at a very fine-grained level, making additional parallelization unnecessary and potentially harmful due to resource contention.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| forecaster | object | Forecaster model. | required |
| refit | (bool, int) | If the forecaster is refitted during the backtesting process. | required |
Returns
| Name | Type | Description |
|---|---|---|
| int | int | The number of jobs to run in parallel. |
Examples
>>> from spotforecast2_safe.model_selection.utils_common import select_n_jobs_backtesting
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from sklearn.linear_model import LinearRegression
>>> forecaster = ForecasterRecursive(LinearRegression(), lags=2)
>>> select_n_jobs_backtesting(forecaster, refit=True)
1