Backtesting of forecaster model following the folds generated by the TimeSeriesFold class and using the metric(s) provided.
If forecaster is already trained and initial_train_size is set to None in the TimeSeriesFold class, no initial train will be done and all data will be used to evaluate the model. However, the first len(forecaster.last_window) observations are needed to create the initial predictors, so no predictions are calculated for them.
A copy of the original forecaster is created so that it is not modified during the process.
Metric used to quantify the goodness of fit of the model. - If str: {‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’} - If Callable: Function with arguments y_true, y_pred and y_train (Optional) that returns a float. - If list: List containing multiple strings and/or Callables.
Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i]. Defaults to None.
Specifies whether probabilistic predictions should be estimated and the method to use. The following options are supported: - If float, represents the nominal (expected) coverage (between 0 and 1). For instance, interval=0.95 corresponds to [2.5, 97.5] percentiles. - If list or tuple: Sequence of percentiles to compute, each value must be between 0 and 100 inclusive. For example, a 95% confidence interval can be specified as interval = [2.5, 97.5] or multiple percentiles (e.g. 10, 50 and 90) as interval = [10, 50, 90]. - If ‘bootstrapping’ (str): n_boot bootstrapping predictions will be generated. - If scipy.stats distribution object, the distribution parameters will be estimated for each prediction. - If None, no probabilistic predictions are estimated. Defaults to None.
Technique used to estimate prediction intervals. Available options: - ‘bootstrapping’: Bootstrapping is used to generate prediction intervals. - ‘conformal’: Employs the conformal prediction split method for interval estimation. Defaults to ‘bootstrapping’.
If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. Defaults to True.
The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores. If ‘auto’, n_jobs is set using the function skforecast.utils.select_n_jobs_backtesting. Defaults to ‘auto’.
If True, spotforecast warnings will be suppressed during the backtesting process. See spotforecast.exceptions.warn_skforecast_categories for more information. Defaults to False.
- metric_values: Value(s) of the metric(s). - backtest_predictions: Value of predictions. The DataFrame includes the following columns: - fold: Indicates the fold number where the prediction was made. - pred: Predicted values for the corresponding series and time steps. If interval is not None, additional columns are included depending on the method: - For float: Columns lower_bound and upper_bound. - For list or tuple of 2 elements: Columns lower_bound and upper_bound. - For list or tuple with multiple percentiles: One column per percentile (e.g., p_10, p_50, p_90). - For 'bootstrapping': One column per bootstrapping iteration (e.g., pred_boot_0, pred_boot_1, …, pred_boot_n). - For scipy.stats distribution objects: One column for each estimated parameter of the distribution (e.g., loc, scale). If return_predictors is True, one column per predictor is created. Depending on the relation between steps and fold_stride, the output may include repeated indexes (if fold_stride < steps) or gaps (if fold_stride > steps). See Notes below for more details.
Notes
Note on fold_stride vs. steps:
If fold_stride == steps, test sets are placed back-to-back without overlap. Each observation appears only once in the output DataFrame, so the index is unique.
If fold_stride < steps, test sets overlap. Multiple forecasts are generated for the same observations and, therefore, the output DataFrame contains repeated indexes.
If fold_stride > steps, there are gaps between consecutive test sets. Some observations in the series will not have associated predictions, so the output DataFrame has non-contiguous indexes.