Recursive autoregressive forecaster for scikit-learn compatible estimators.
This class turns any estimator compatible with the scikit-learn API into a recursive autoregressive (multi-step) forecaster. The forecaster learns to predict future values by using lagged values of the target variable and optional exogenous features. Predictions are made iteratively, where each step uses previous predictions as input for the next step (recursive strategy).
Lagged values of the target variable to use as predictors. Can be an integer (uses lags from 1 to lags), list of integers, numpy array, or range. At least one of lags or window_features must be provided. Defaults to None.
List of window feature objects to compute features from the target variable. Each object must implement transform_batch() method. At least one of lags or window_features must be provided. Defaults to None.
Alternative parameter name for estimator. If provided, used instead of estimator. Defaults to None.
None
Attributes
Name
Type
Description
estimator
Fitted scikit-learn estimator.
lags
Lag indices used in the model.
lags_names
Names of lag features (e.g., [‘lag_1’, ‘lag_2’]).
window_features
List of window feature transformers.
window_features_names
Names of window features.
window_size
Maximum window size needed (max of lags and window features).
transformer_y
Transformer for target variable.
transformer_exog
Transformer for exogenous variables.
weight_func
Function for sample weighting.
differentiation
Order of differencing applied.
differentiator
TimeSeriesDifferentiator instance if differencing is used.
is_fitted
Boolean indicating if forecaster has been fitted.
fit_date
Timestamp of the last fit operation.
last_window_
Last window_size observations from training data.
index_type_
Type of index in training data (RangeIndex or DatetimeIndex).
index_freq_
Frequency of DatetimeIndex if applicable.
training_range_
First and last index values of training data.
series_name_in_
Name of the target series.
exog_in_
Boolean indicating if exogenous variables were used in training.
exog_names_in_
Names of exogenous variables.
exog_type_in_
Type of exogenous input (Series or DataFrame).
X_train_features_names_out_
Names of all training features.
in_sample_residuals_
Residuals from training set.
in_sample_residuals_by_bin_
Residuals grouped by bins for probabilistic pred.
forecaster_id
Identifier for the forecaster instance.
Note
Either lags or window_features (or both) must be provided during initialization.
The forecaster uses a recursive strategy where each multi-step prediction depends on previous predictions within the same forecast horizon.
Exogenous variables must have the same index as the target variable and must be available for the entire prediction horizon.
The forecaster supports point predictions, prediction intervals, bootstrapping, quantile predictions, and probabilistic forecasts via conformal methods.
Examples
Create a basic forecaster with lags:
>>>import numpy as np>>>import pandas as pd>>>from sklearn.linear_model import LinearRegression>>>from spotforecast2_safe.forecaster.recursive import ForecasterRecursive>>> y = pd.Series(np.random.randn(100), name='y')>>> forecaster = ForecasterRecursive(... estimator=LinearRegression(),... lags=10... )>>> forecaster.fit(y)>>> predictions = forecaster.predict(steps=5)
Create a forecaster with window features and transformations:
Create the predictors needed to predict steps ahead. As it is a recursive process, the predictors are created at each iteration of the prediction process.
Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored in self.last_window_ are used to calculate the initial predictors, and the predictions start right after training data. Defaults to None.
If True, the input is checked for possible warnings and errors with the check_predict_input function. This argument is created for internal use and is not recommended to be changed. Defaults to True.
forecaster.recursive._forecaster_recursive.ForecasterRecursive.create_train_X_y( y, exog=None,)
Public method to create training predictors and target values.
This method is a public wrapper around the internal method _create_train_X_y, which generates the training predictors and target values based on the provided time series and exogenous variables. It ensures that the necessary transformations and feature engineering steps are applied to prepare the data for training the forecaster.
Optional exogenous variables for training. Can be a pandas Series or DataFrame. Must have the same index as y and cover the same time range. Defaults to None.
Tuple containing: - X_train: DataFrame of training predictors including lags, window features, and exogenous variables (if provided). - y_train: Series of target values aligned with the predictors.
Optional exogenous variables for training. Can be a pandas Series or DataFrame.Must have the same index as y and cover the same time range. Defaults to None.
Return feature importances of the estimator stored in the forecaster. Only valid when estimator stores internally the feature importances in the attribute feature_importances_ or coef_. Otherwise, returns None.
Optional last window of observed values to use for prediction. If None, uses the last window from training. Must be a pandas Series or DataFrame with the same structure as the training target series. Defaults to None.
Optional exogenous variables for prediction. Can be a pandas Series or DataFrame. Must have the same structure as the exogenous variables used in training. Defaults to None.
Generate multiple forecasting predictions using a bootstrapping process. By sampling from a collection of past observed errors (the residuals), each iteration of bootstrapping generates a different set of predictions. See the References section for more information.
Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored in self.last_window_ are used to calculate the initial predictors, and the predictions start right after training data. Defaults to None.
If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. Defaults to True.
Fit a given probability distribution for each step. After generating multiple forecasting predictions through a bootstrapping process, each step is fitted to the given distribution.
Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored inself.last_window_ are used to calculate the initial predictors, and the predictions start right after training data.
If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method.
Predict n steps ahead and estimate prediction intervals using either bootstrapping or conformal prediction methods. Refer to the References section for additional details on these methods.
Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored in self.last_window_ are used to calculate the initial predictors, and the predictions start right after training data. Defaults to None.
Technique used to estimate prediction intervals. Available options: - ‘bootstrapping’: Bootstrapping is used to generate prediction intervals [1]. - ‘conformal’: Employs the conformal prediction split method for interval estimation [2]. Defaults to ‘bootstrapping’.
Confidence level of the prediction interval. Interpretation depends on the method used: - If float, represents the nominal (expected) coverage (between 0 and 1). For instance, interval=0.95 corresponds to [2.5, 97.5] percentiles. - If list or tuple, defines the exact percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as interval = [2.5, 97.5]. - When using method='conformal', the interval must be a float or a list/tuple defining a symmetric interval. Defaults to [5, 95].
If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. Defaults to True.
.. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html .. [2] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method
Calculate the specified quantiles for each step. After generating multiple forecasting predictions through a bootstrapping process, each quantile is calculated for each step.
Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored inself.last_window_ are used to calculate the initial predictors, and the predictions start right after training data.
Sequence of quantiles to compute, which must be between 0 and 1 inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as quantiles = [0.05, 0.5, 0.95].
If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method.
forecaster.recursive._forecaster_recursive.ForecasterRecursive.set_in_sample_residuals( y, exog=None, random_state=123,)
Set in-sample residuals in case they were not calculated during the training process.
In-sample residuals are calculated as the difference between the true values and the predictions made by the forecaster using the training data. The following internal attributes are updated:
in_sample_residuals_: residuals stored in a numpy ndarray.
binner_intervals_: intervals used to bin the residuals are calculated using the quantiles of the predicted values.
in_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range.
A total of 10_000 residuals are stored in the attribute in_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.
Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. - int: include lags from 1 to lags (included). - list, 1d numpy ndarray or range: include only lags present in lags, all elements must be int. - None: no lags are included as predictors.
Set new values to the attribute out_sample_residuals_.
Out of sample residuals are meant to be calculated using observations that did not participate in the training process. y_true and y_pred are expected to be in the original scale of the time series. Residuals are calculated as y_true - y_pred, after applying the necessary transformations and differentiations if the forecaster includes them (self.transformer_y and self.differentiation). Two internal attributes are updated:
out_sample_residuals_: residuals stored in a numpy ndarray.
out_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range. If a bin is empty, it is filled with a random sample of residuals from other bins. This is done to ensure that all bins have at least one residual and can be used in the prediction process.
A total of 10_000 residuals are stored in the attribute out_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.
If True, new residuals are added to the once already stored in the forecaster. If after appending the new residuals, the limit of 10_000 // self.binner.n_bins_ values per bin is reached, a random sample of residuals is stored.
Dictionary of parameter names mapped to their new values. Parameters can be for the forecaster itself or for the contained estimator (using the estimator__ prefix).
Instance or list of instances used to create window features. Window features are created from the original time series and are included as predictors.