backtesting.validation.backtesting_forecaster

backtesting.validation.backtesting_forecaster(
    forecaster,
    y,
    cv,
    metric,
    exog=None,
    interval=None,
    interval_method='bootstrapping',
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    return_predictors=False,
    n_jobs='auto',
    verbose=False,
    show_progress=True,
    suppress_warnings=False,
)

Backtesting of forecaster model following the folds generated by the TimeSeriesFold class and using the metric(s) provided.

If forecaster is already trained and initial_train_size is set to None in the TimeSeriesFold class, no initial train will be done and all data will be used to evaluate the model. However, the first len(forecaster.last_window) observations are needed to create the initial predictors, so no predictions are calculated for them.

A copy of the original forecaster is created so that it is not modified during the process.

Parameters

Name	Type	Description	Default
forecaster	(`ForecasterRecursive`, `ForecasterDirect`, `ForecasterEquivalentDate`)	Forecaster model.	required
y	pd.Series	Training time series.	required
cv	TimeSeriesFold	TimeSeriesFold object with the information needed to split the data into folds.	required
metric	str \| Callable \| list	Metric used to quantify the goodness of fit of the model. - If `str`: {‘mean_squared_error’, ‘mean_absolute_error’, ‘mean_absolute_percentage_error’, ‘mean_squared_log_error’, ‘mean_absolute_scaled_error’, ‘root_mean_squared_scaled_error’} - If `Callable`: Function with arguments `y_true`, `y_pred` and `y_train` (Optional) that returns a float. - If `list`: List containing multiple strings and/or Callables.	required
exog	pd.Series \| pd.DataFrame	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and should be aligned so that y[i] is regressed on exog[i]. Defaults to None.	`None`
interval	float \| list \| tuple \| str \| object	Specifies whether probabilistic predictions should be estimated and the method to use. The following options are supported: - If `float`, represents the nominal (expected) coverage (between 0 and 1). For instance, `interval=0.95` corresponds to `[2.5, 97.5]` percentiles. - If `list` or `tuple`: Sequence of percentiles to compute, each value must be between 0 and 100 inclusive. For example, a 95% confidence interval can be specified as `interval = [2.5, 97.5]` or multiple percentiles (e.g. 10, 50 and 90) as `interval = [10, 50, 90]`. - If ‘bootstrapping’ (str): `n_boot` bootstrapping predictions will be generated. - If scipy.stats distribution object, the distribution parameters will be estimated for each prediction. - If None, no probabilistic predictions are estimated. Defaults to None.	`None`
interval_method	str	Technique used to estimate prediction intervals. Available options: - ‘bootstrapping’: Bootstrapping is used to generate prediction intervals. - ‘conformal’: Employs the conformal prediction split method for interval estimation. Defaults to ‘bootstrapping’.	`'bootstrapping'`
n_boot	int	Number of bootstrapping iterations to perform when estimating prediction intervals. Defaults to 250.	`250`
use_in_sample_residuals	bool	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s `set_out_sample_residuals()` method. Defaults to True.	`True`
use_binned_residuals	bool	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. Defaults to True.	`True`
random_state	int	Seed for the random number generator to ensure reproducibility. Defaults to 123.	`123`
return_predictors	bool	If `True`, the predictors used to make the predictions are also returned. Defaults to False.	`False`
n_jobs	int \| str	The number of jobs to run in parallel. If `-1`, then the number of jobs is set to the number of cores. If ‘auto’, `n_jobs` is set using the function `spotforecast2_safe.splitter.utils_common.select_n_jobs_backtesting`. Defaults to ‘auto’.	`'auto'`
verbose	bool	Print number of folds and index of training and validation sets used for backtesting. Defaults to False.	`False`
show_progress	bool	Whether to show a progress bar. Defaults to True.	`True`
suppress_warnings	bool	If `True`, spotforecast warnings will be suppressed during the backtesting process. See `spotforecast.exceptions.warn_skforecast_categories` for more information. Defaults to False.	`False`

Returns

Name	Type	Description
tuple	(pd.DataFrame, pd.DataFrame)	- metric_values: Value(s) of the metric(s). - backtest_predictions: Value of predictions. The DataFrame includes the following columns: - fold: Indicates the fold number where the prediction was made. - pred: Predicted values for the corresponding series and time steps. If `interval` is not `None`, additional columns are included depending on the method: - For `float`: Columns `lower_bound` and `upper_bound`. - For `list` or `tuple` of 2 elements: Columns `lower_bound` and `upper_bound`. - For `list` or `tuple` with multiple percentiles: One column per percentile (e.g., `p_10`, `p_50`, `p_90`). - For `'bootstrapping'`: One column per bootstrapping iteration (e.g., `pred_boot_0`, `pred_boot_1`, …, `pred_boot_n`). - For `scipy.stats` distribution objects: One column for each estimated parameter of the distribution (e.g., `loc`, `scale`). If `return_predictors` is `True`, one column per predictor is created. Depending on the relation between `steps` and `fold_stride`, the output may include repeated indexes (if `fold_stride < steps`) or gaps (if `fold_stride > steps`). See Notes below for more details.

Notes

Note on fold_stride vs. steps:

If fold_stride == steps, test sets are placed back-to-back without overlap. Each observation appears only once in the output DataFrame, so the index is unique.
If fold_stride < steps, test sets overlap. Multiple forecasts are generated for the same observations and, therefore, the output DataFrame contains repeated indexes.
If fold_stride > steps, there are gaps between consecutive test sets. Some observations in the series will not have associated predictions, so the output DataFrame has non-contiguous indexes.

Examples

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

from spotforecast2_safe.backtesting.validation import backtesting_forecaster
from spotforecast2_safe.forecaster import ForecasterRecursive
from spotforecast2_safe.splitter import TimeSeriesFold

rng = np.random.default_rng(0)
y = pd.Series(rng.standard_normal(80), name="y")
forecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)
cv = TimeSeriesFold(steps=2, initial_train_size=40, refit=False)

metric_values, backtest_predictions = backtesting_forecaster(
    forecaster=forecaster,
    y=y,
    cv=cv,
    metric="mean_squared_error",
    n_jobs=1,
    verbose=False,
    show_progress=False,
    suppress_warnings=True,
)
print(metric_values)
print(backtest_predictions.head())
assert "mean_squared_error" in metric_values.columns
assert "fold" in backtest_predictions.columns
assert "pred" in backtest_predictions.columns

   mean_squared_error
0            1.318654
    fold      pred
40     0  0.371587
41     0  0.132886
42     1  0.178838
43     1  0.140537
44     2  0.201316