model_selection.validation.backtesting_forecaster
model_selection.validation.backtesting_forecaster(
forecaster,
y,
cv,
metric,
exog=None,
interval=None,
interval_method='bootstrapping',
n_boot=250,
use_in_sample_residuals=True,
use_binned_residuals=True,
random_state=123,
return_predictors=False,
n_jobs='auto',
verbose=False,
show_progress=True,
suppress_warnings=False,
)Backtesting of forecaster model following the folds generated by the TimeSeriesFold class and using the metric(s) provided.
If forecaster is already trained and initial_train_size is set to None in the TimeSeriesFold class, no initial train will be done and all data will be used to evaluate the model. However, the first len(forecaster.last_window) observations are needed to create the initial predictors, so no predictions are calculated for them.
A copy of the original forecaster is created so that it is not modified during the process.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| forecaster | (ForecasterRecursive, ForecasterDirect, ForecasterEquivalentDate) |
Forecaster model. | required |
| y | pd.Series | Training time series. | required |
| cv | TimeSeriesFold | TimeSeriesFold object with the information needed to split the data into folds. | required |
| metric | str | Callable | list | Metric used to quantify the goodness of fit of the model. - If str: {‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’} - If Callable: Function with arguments y_true, y_pred and y_train (Optional) that returns a float. - If list: List containing multiple strings and/or Callables. |
required |
| exog | pd.Series | pd.DataFrame | Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i]. Defaults to None. |
None |
| interval | float | list | tuple | str | object | Specifies whether probabilistic predictions should be estimated and the method to use. The following options are supported: - If float, represents the nominal (expected) coverage (between 0 and 1). For instance, interval=0.95 corresponds to [2.5, 97.5] percentiles. - If list or tuple: Sequence of percentiles to compute, each value must be between 0 and 100 inclusive. For example, a 95% confidence interval can be specified as interval = [2.5, 97.5] or multiple percentiles (e.g. 10, 50 and 90) as interval = [10, 50, 90]. - If ‘bootstrapping’ (str): n_boot bootstrapping predictions will be generated. - If scipy.stats distribution object, the distribution parameters will be estimated for each prediction. - If None, no probabilistic predictions are estimated. Defaults to None. |
None |
| interval_method | str | Technique used to estimate prediction intervals. Available options: - ‘bootstrapping’: Bootstrapping is used to generate prediction intervals. - ‘conformal’: Employs the conformal prediction split method for interval estimation. Defaults to ‘bootstrapping’. | 'bootstrapping' |
| n_boot | int | Number of bootstrapping iterations to perform when estimating prediction intervals. Defaults to 250. | 250 |
| use_in_sample_residuals | bool | If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. Defaults to True. |
True |
| use_binned_residuals | bool | If True, residuals are selected based on the predicted values (binned selection). If False, residuals are selected randomly. Defaults to True. |
True |
| random_state | int | Seed for the random number generator to ensure reproducibility. Defaults to 123. | 123 |
| return_predictors | bool | If True, the predictors used to make the predictions are also returned. Defaults to False. |
False |
| n_jobs | int | str | The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores. If ‘auto’, n_jobs is set using the function skforecast.utils.select_n_jobs_backtesting. Defaults to ‘auto’. |
'auto' |
| verbose | bool | Print number of folds and index of training and validation sets used for backtesting. Defaults to False. | False |
| show_progress | bool | Whether to show a progress bar. Defaults to True. | True |
| suppress_warnings | bool | If True, spotforecast warnings will be suppressed during the backtesting process. See spotforecast.exceptions.warn_skforecast_categories for more information. Defaults to False. |
False |
Returns
| Name | Type | Description |
|---|---|---|
| tuple | (pd.DataFrame, pd.DataFrame) | - metric_values: Value(s) of the metric(s). - backtest_predictions: Value of predictions. The DataFrame includes the following columns: - fold: Indicates the fold number where the prediction was made. - pred: Predicted values for the corresponding series and time steps. If interval is not None, additional columns are included depending on the method: - For float: Columns lower_bound and upper_bound. - For list or tuple of 2 elements: Columns lower_bound and upper_bound. - For list or tuple with multiple percentiles: One column per percentile (e.g., p_10, p_50, p_90). - For 'bootstrapping': One column per bootstrapping iteration (e.g., pred_boot_0, pred_boot_1, …, pred_boot_n). - For scipy.stats distribution objects: One column for each estimated parameter of the distribution (e.g., loc, scale). If return_predictors is True, one column per predictor is created. Depending on the relation between steps and fold_stride, the output may include repeated indexes (if fold_stride < steps) or gaps (if fold_stride > steps). See Notes below for more details. |
Notes
Note on fold_stride vs. steps:
- If
fold_stride == steps, test sets are placed back-to-back without overlap. Each observation appears only once in the output DataFrame, so the index is unique. - If
fold_stride < steps, test sets overlap. Multiple forecasts are generated for the same observations and, therefore, the output DataFrame contains repeated indexes. - If
fold_stride > steps, there are gaps between consecutive test sets. Some observations in the series will not have associated predictions, so the output DataFrame has non-contiguous indexes.
Examples
>>> import pandas as pd
>>> from sklearn.ensemble import RandomForestRegressor
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.model_selection import backtesting_forecaster, TimeSeriesFold
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(
... estimator=RandomForestRegressor(random_state=123),
... lags=5
... )
>>> cv = TimeSeriesFold(
... steps=2,
... initial_train_size=5,
... refit=False
... )
>>> metric_values, backtest_predictions = backtesting_forecaster(
... forecaster=forecaster,
... y=y,
... cv=cv,
... metric='mean_squared_error'
... )
>>> metric_values
mean_squared_error
0 0.201334
>>> backtest_predictions
fold pred
5 0 5.18
6 0 6.10
7 1 7.36
8 1 8.40
9 2 9.31