model_selection.validation.backtesting_forecaster

model_selection.validation.backtesting_forecaster(
    forecaster,
    y,
    cv,
    metric,
    exog=None,
    interval=None,
    interval_method='bootstrapping',
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    return_predictors=False,
    n_jobs='auto',
    verbose=False,
    show_progress=True,
    suppress_warnings=False,
)

Backtesting of forecaster model following the folds generated by the TimeSeriesFold class and using the metric(s) provided.

If forecaster is already trained and initial_train_size is set to None in the TimeSeriesFold class, no initial train will be done and all data will be used to evaluate the model. However, the first len(forecaster.last_window) observations are needed to create the initial predictors, so no predictions are calculated for them.

A copy of the original forecaster is created so that it is not modified during the process.

Parameters

Name Type Description Default
forecaster (ForecasterRecursive, ForecasterDirect, ForecasterEquivalentDate) Forecaster model. required
y pd.Series Training time series. required
cv TimeSeriesFold TimeSeriesFold object with the information needed to split the data into folds. required
metric str | Callable | list Metric used to quantify the goodness of fit of the model. - If str: {‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’} - If Callable: Function with arguments y_true, y_pred and y_train (Optional) that returns a float. - If list: List containing multiple strings and/or Callables. required
exog pd.Series | pd.DataFrame Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i]. Defaults to None. None
interval float | list | tuple | str | object Specifies whether probabilistic predictions should be estimated and the method to use. The following options are supported: - If float, represents the nominal (expected) coverage (between 0 and 1). For instance, interval=0.95 corresponds to [2.5, 97.5] percentiles. - If list or tuple: Sequence of percentiles to compute, each value must be between 0 and 100 inclusive. For example, a 95% confidence interval can be specified as interval = [2.5, 97.5] or multiple percentiles (e.g. 10, 50 and 90) as interval = [10, 50, 90]. - If ‘bootstrapping’ (str): n_boot bootstrapping predictions will be generated. - If scipy.stats distribution object, the distribution parameters will be estimated for each prediction. - If None, no probabilistic predictions are estimated. Defaults to None. None
interval_method str Technique used to estimate prediction intervals. Available options: - ‘bootstrapping’: Bootstrapping is used to generate prediction intervals. - ‘conformal’: Employs the conformal prediction split method for interval estimation. Defaults to ‘bootstrapping’. 'bootstrapping'
n_boot int Number of bootstrapping iterations to perform when estimating prediction intervals. Defaults to 250. 250
use_in_sample_residuals bool If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. Defaults to True. True
use_binned_residuals bool If True, residuals are selected based on the predicted values (binned selection). If False, residuals are selected randomly. Defaults to True. True
random_state int Seed for the random number generator to ensure reproducibility. Defaults to 123. 123
return_predictors bool If True, the predictors used to make the predictions are also returned. Defaults to False. False
n_jobs int | str The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores. If ‘auto’, n_jobs is set using the function skforecast.utils.select_n_jobs_backtesting. Defaults to ‘auto’. 'auto'
verbose bool Print number of folds and index of training and validation sets used for backtesting. Defaults to False. False
show_progress bool Whether to show a progress bar. Defaults to True. True
suppress_warnings bool If True, spotforecast warnings will be suppressed during the backtesting process. See spotforecast.exceptions.warn_skforecast_categories for more information. Defaults to False. False

Returns

Name Type Description
tuple (pd.DataFrame, pd.DataFrame) - metric_values: Value(s) of the metric(s). - backtest_predictions: Value of predictions. The DataFrame includes the following columns: - fold: Indicates the fold number where the prediction was made. - pred: Predicted values for the corresponding series and time steps. If interval is not None, additional columns are included depending on the method: - For float: Columns lower_bound and upper_bound. - For list or tuple of 2 elements: Columns lower_bound and upper_bound. - For list or tuple with multiple percentiles: One column per percentile (e.g., p_10, p_50, p_90). - For 'bootstrapping': One column per bootstrapping iteration (e.g., pred_boot_0, pred_boot_1, …, pred_boot_n). - For scipy.stats distribution objects: One column for each estimated parameter of the distribution (e.g., loc, scale). If return_predictors is True, one column per predictor is created. Depending on the relation between steps and fold_stride, the output may include repeated indexes (if fold_stride < steps) or gaps (if fold_stride > steps). See Notes below for more details.

Notes

Note on fold_stride vs. steps:

  • If fold_stride == steps, test sets are placed back-to-back without overlap. Each observation appears only once in the output DataFrame, so the index is unique.
  • If fold_stride < steps, test sets overlap. Multiple forecasts are generated for the same observations and, therefore, the output DataFrame contains repeated indexes.
  • If fold_stride > steps, there are gaps between consecutive test sets. Some observations in the series will not have associated predictions, so the output DataFrame has non-contiguous indexes.

Examples

>>> import pandas as pd
>>> from sklearn.ensemble import RandomForestRegressor
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.model_selection import backtesting_forecaster, TimeSeriesFold
>>> y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> forecaster = ForecasterRecursive(
...     estimator=RandomForestRegressor(random_state=123),
...     lags=5
... )
>>> cv = TimeSeriesFold(
...     steps=2,
...     initial_train_size=5,
...     refit=False
... )
>>> metric_values, backtest_predictions = backtesting_forecaster(
...     forecaster=forecaster,
...     y=y,
...     cv=cv,
...     metric='mean_squared_error'
... )
>>> metric_values
   mean_squared_error
0            0.201334
>>> backtest_predictions
   fold  pred
5     0  5.18
6     0  6.10
7     1  7.36
8     1  8.40
9     2  9.31