forecaster.recursive._forecaster_recursive.ForecasterRecursive

forecaster.recursive._forecaster_recursive.ForecasterRecursive(
    estimator=None,
    lags=None,
    window_features=None,
    transformer_y=None,
    transformer_exog=None,
    weight_func=None,
    differentiation=None,
    fit_kwargs=None,
    binner_kwargs=None,
    forecaster_id=None,
    regressor=None,
)

Recursive autoregressive forecaster for scikit-learn compatible estimators.

This class turns any estimator compatible with the scikit-learn API into a recursive autoregressive (multi-step) forecaster. The forecaster learns to predict future values by using lagged values of the target variable and optional exogenous features. Predictions are made iteratively, where each step uses previous predictions as input for the next step (recursive strategy).

Parameters

Name Type Description Default
estimator object Scikit-learn compatible estimator for regression. If None, a default estimator will be initialized. Can also be passed via regressor parameter. None
lags Union[int, List[int], np.ndarray, range, None] Lagged values of the target variable to use as predictors. Can be an integer (uses lags from 1 to lags), list of integers, numpy array, or range. At least one of lags or window_features must be provided. Defaults to None. None
window_features Union[object, List[object], None] List of window feature objects to compute features from the target variable. Each object must implement transform_batch() method. At least one of lags or window_features must be provided. Defaults to None. None
transformer_y Optional[object] Transformer object for the target variable. Must implement fit() and transform() methods. Applied before training and predictions. Defaults to None. None
transformer_exog Optional[object] Transformer object for exogenous variables. Must implement fit() and transform() methods. Applied before training and predictions. Defaults to None. None
weight_func Optional[Callable] Function to compute sample weights for training. Must accept an index and return an array of weights. Defaults to None. None
differentiation Optional[int] Order of differencing to apply to the target variable. Must be a positive integer. Differencing is applied before creating lags. Defaults to None. None
fit_kwargs Optional[Dict[str, object]] Dictionary of additional keyword arguments to pass to the estimator’s fit() method. Defaults to None. None
binner_kwargs Optional[Dict[str, object]] Dictionary of keyword arguments for QuantileBinner used in probabilistic predictions. Defaults to {‘n_bins’: 10, ‘method’: ‘linear’}. None
forecaster_id Union[str, int, None] Identifier for the forecaster instance. Can be a string or integer. Used for tracking and logging purposes. Defaults to None. None
regressor object Alternative parameter name for estimator. If provided, used instead of estimator. Defaults to None. None

Attributes

Name Type Description
estimator Fitted scikit-learn estimator.
lags Lag indices used in the model.
lags_names Names of lag features (e.g., [‘lag_1’, ‘lag_2’]).
window_features List of window feature transformers.
window_features_names Names of window features.
window_size Maximum window size needed (max of lags and window features).
transformer_y Transformer for target variable.
transformer_exog Transformer for exogenous variables.
weight_func Function for sample weighting.
differentiation Order of differencing applied.
differentiator TimeSeriesDifferentiator instance if differencing is used.
is_fitted Boolean indicating if forecaster has been fitted.
fit_date Timestamp of the last fit operation.
last_window_ Last window_size observations from training data.
index_type_ Type of index in training data (RangeIndex or DatetimeIndex).
index_freq_ Frequency of DatetimeIndex if applicable.
training_range_ First and last index values of training data.
series_name_in_ Name of the target series.
exog_in_ Boolean indicating if exogenous variables were used in training.
exog_names_in_ Names of exogenous variables.
exog_type_in_ Type of exogenous input (Series or DataFrame).
X_train_features_names_out_ Names of all training features.
in_sample_residuals_ Residuals from training set.
in_sample_residuals_by_bin_ Residuals grouped by bins for probabilistic pred.
forecaster_id Identifier for the forecaster instance.

Note

  • Either lags or window_features (or both) must be provided during initialization.
  • The forecaster uses a recursive strategy where each multi-step prediction depends on previous predictions within the same forecast horizon.
  • Exogenous variables must have the same index as the target variable and must be available for the entire prediction horizon.
  • The forecaster supports point predictions, prediction intervals, bootstrapping, quantile predictions, and probabilistic forecasts via conformal methods.

Examples

Create a basic forecaster with lags:

>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> y = pd.Series(np.random.randn(100), name='y')
>>> forecaster = ForecasterRecursive(
...     estimator=LinearRegression(),
...     lags=10
... )
>>> forecaster.fit(y)
>>> predictions = forecaster.predict(steps=5)

Create a forecaster with window features and transformations:

>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.preprocessing import StandardScaler
>>> from spotforecast2_safe.preprocessing import RollingFeatures
>>> import pandas as pd
>>> y = pd.Series(np.random.randn(100), name='y')
>>> forecaster = ForecasterRecursive(
...     estimator=RandomForestRegressor(n_estimators=100),
...     lags=[1, 7, 30],
...     window_features=[RollingFeatures(stats='mean', window_sizes=7)],
...     transformer_y=StandardScaler(),
...     differentiation=1
... )
>>> forecaster.fit(y)
>>> predictions = forecaster.predict(steps=10)

Create a forecaster with exogenous variables:

>>> import pandas as pd
>>> from sklearn.linear_model import Ridge
>>> y = pd.Series(np.random.randn(100), name='target')
>>> exog = pd.DataFrame({'temp': np.random.randn(100)}, index=y.index)
>>> forecaster = ForecasterRecursive(
...     estimator=Ridge(),
...     lags=7,
...     forecaster_id='my_forecaster'
... )
>>> forecaster.fit(y, exog)
>>> exog_future = pd.DataFrame(
...     {'temp': np.random.randn(5)},
...     index=pd.RangeIndex(start=100, stop=105)
... )
>>> predictions = forecaster.predict(steps=5, exog=exog_future)

Create a forecaster with probabilistic prediction configuration:

>>> from sklearn.ensemble import GradientBoostingRegressor
>>> import pandas as pd
>>> y = pd.Series(np.random.randn(100), name='y')
>>> forecaster = ForecasterRecursive(
...     estimator=GradientBoostingRegressor(),
...     lags=14,
...     binner_kwargs={'n_bins': 15, 'method': 'linear'}
... )
>>> forecaster.fit(y, store_in_sample_residuals=True)
>>> predictions = forecaster.predict(steps=5)

Methods

Name Description
create_predict_X Create the predictors needed to predict steps ahead. As it is a recursive
create_sample_weights Create weights for each observation according to the forecaster’s attribute
create_train_X_y Public method to create training predictors and target values.
fit Fit the forecaster to the training data.
get_feature_importances Return feature importances of the estimator stored in the forecaster.
get_params Get parameters for this forecaster.
predict Predict future values recursively for the specified number of steps.
predict_bootstrapping Generate multiple forecasting predictions using a bootstrapping process.
predict_dist Fit a given probability distribution for each step. After generating
predict_interval Predict n steps ahead and estimate prediction intervals using either
predict_quantiles Calculate the specified quantiles for each step. After generating
set_fit_kwargs Set new values for the additional keyword arguments passed to the fit
set_in_sample_residuals Set in-sample residuals in case they were not calculated during the
set_lags Set new value to the attribute lags. Attributes lags_names,
set_out_sample_residuals Set new values to the attribute out_sample_residuals_.
set_params Set the parameters of this forecaster.
set_window_features Set new value to the attribute window_features.

create_predict_X

forecaster.recursive._forecaster_recursive.ForecasterRecursive.create_predict_X(
    steps,
    last_window=None,
    exog=None,
    check_inputs=True,
)

Create the predictors needed to predict steps ahead. As it is a recursive process, the predictors are created at each iteration of the prediction process.

Parameters

Name Type Description Default
steps int Number of steps to predict. - If steps is int, number of steps to predict. - If str or pandas Datetime, the prediction will be up to that date. required
last_window pd.Series | pd.DataFrame | None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored in self.last_window_ are used to calculate the initial predictors, and the predictions start right after training data. Defaults to None. None
exog pd.Series | pd.DataFrame | None Exogenous variable/s included as predictor/s. Defaults to None. None
check_inputs bool If True, the input is checked for possible warnings and errors with the check_predict_input function. This argument is created for internal use and is not recommended to be changed. Defaults to True. True

Returns

Name Type Description
pd.DataFrame Pandas DataFrame with the predictors for each step. The index
pd.DataFrame is the same as the prediction index.

create_sample_weights

forecaster.recursive._forecaster_recursive.ForecasterRecursive.create_sample_weights(
    X_train,
)

Create weights for each observation according to the forecaster’s attribute weight_func.

Parameters

Name Type Description Default
X_train pd.DataFrame Dataframe created with the create_train_X_y method, first return. required

Returns

Name Type Description
np.ndarray Weights to use in fit method.

create_train_X_y

forecaster.recursive._forecaster_recursive.ForecasterRecursive.create_train_X_y(
    y,
    exog=None,
)

Public method to create training predictors and target values.

This method is a public wrapper around the internal method _create_train_X_y, which generates the training predictors and target values based on the provided time series and exogenous variables. It ensures that the necessary transformations and feature engineering steps are applied to prepare the data for training the forecaster.

Parameters

Name Type Description Default
y pd.Series Target series for training. Must be a pandas Series. required
exog Union[pd.Series, pd.DataFrame, None] Optional exogenous variables for training. Can be a pandas Series or DataFrame. Must have the same index as y and cover the same time range. Defaults to None. None

Returns

Name Type Description
Tuple[pd.DataFrame, pd.Series] Tuple containing: - X_train: DataFrame of training predictors including lags, window features, and exogenous variables (if provided). - y_train: Series of target values aligned with the predictors.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.preprocessing import RollingFeatures
>>> y = pd.Series(np.arange(30), name='y')
>>> exog = pd.DataFrame({'temp': np.random.randn(30)}, index=y.index)
>>> forecaster = ForecasterRecursive(
...     estimator=LinearRegression(),
...     lags=3,
...     window_features=[RollingFeatures(stats='mean', window_sizes=3)]
... )
>>> X_train, y_train = forecaster.create_train_X_y(y=y, exog=exog)
>>> isinstance(X_train, pd.DataFrame)
True
>>> isinstance(y_train, pd.Series)
True

fit

forecaster.recursive._forecaster_recursive.ForecasterRecursive.fit(
    y,
    exog=None,
    store_last_window=True,
    store_in_sample_residuals=False,
    random_state=123,
    suppress_warnings=False,
)

Fit the forecaster to the training data.

Parameters

Name Type Description Default
y pd.Series Target series for training. Must be a pandas Series. required
exog Union[pd.Series, pd.DataFrame, None] Optional exogenous variables for training. Can be a pandas Series or DataFrame.Must have the same index as y and cover the same time range. Defaults to None. None
store_last_window bool Whether to store the last window of the training series for use in prediction. Defaults to True. True
store_in_sample_residuals bool Whether to store in-sample residuals after fitting, which can be used for certain probabilistic prediction methods. Defaults to False. False
random_state int Random seed for reproducibility when sampling residuals if store_in_sample_residuals is True. Defaults to 123. 123
suppress_warnings bool Whether to suppress warnings during fitting, such as those related to insufficient data length for lags or window features. Defaults to False. False

Returns

Name Type Description
None None

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.preprocessing import RollingFeatures
>>> y = pd.Series(np.arange(30), name='y')
>>> exog = pd.DataFrame({'temp': np.random.randn(30)}, index=y.index)
>>> forecaster = ForecasterRecursive(
...     estimator=LinearRegression(),
...     lags=3,
...     window_features=[RollingFeatures(stats='mean', window_sizes=3)]
... )
>>> forecaster.fit(y=y, exog=exog, store_in_sample_residuals=True)

get_feature_importances

forecaster.recursive._forecaster_recursive.ForecasterRecursive.get_feature_importances(
    sort_importance=True,
)

Return feature importances of the estimator stored in the forecaster. Only valid when estimator stores internally the feature importances in the attribute feature_importances_ or coef_. Otherwise, returns None.

Parameters

Name Type Description Default
sort_importance bool If True, sorts the feature importances in descending order. True

Returns

Name Type Description
pd.DataFrame pd.DataFrame: Feature importances associated with each predictor.

Raises

Name Type Description
NotFittedError If the forecaster is not fitted.

Examples

>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> import pandas as pd
>>> import numpy as np
>>> forecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)
>>> forecaster.fit(y=pd.Series(np.arange(20)))
>>> forecaster.get_feature_importances()
  feature  importance
0   lag_1         1.0
1   lag_2         0.0
2   lag_3         0.0

get_params

forecaster.recursive._forecaster_recursive.ForecasterRecursive.get_params(
    deep=True,
)

Get parameters for this forecaster.

Parameters

Name Type Description Default
deep bool If True, will return the parameters for this forecaster and contained sub-objects that are estimators. True

Returns

Name Type Description
params Dict[str, object] Dictionary of parameter names mapped to their values.

Examples

>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> forecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)
>>> forecaster.get_params()
{
    'estimator': LinearRegression(), 'lags': 3, 'window_features': None,
    'transformer_y': None, 'transformer_exog': None, 'weight_func': None,
    'differentiation': None, 'fit_kwargs': {}, 'binner_kwargs': None, 'forecaster_id': '...'}

predict

forecaster.recursive._forecaster_recursive.ForecasterRecursive.predict(
    steps,
    last_window=None,
    exog=None,
    check_inputs=True,
)

Predict future values recursively for the specified number of steps.

Parameters

Name Type Description Default
steps int | str | pd.Timestamp Number of future steps to predict. required
last_window Union[pd.Series, pd.DataFrame, None] Optional last window of observed values to use for prediction. If None, uses the last window from training. Must be a pandas Series or DataFrame with the same structure as the training target series. Defaults to None. None
exog Union[pd.Series, pd.DataFrame, None] Optional exogenous variables for prediction. Can be a pandas Series or DataFrame. Must have the same structure as the exogenous variables used in training. Defaults to None. None
check_inputs bool Whether to perform input validation checks. Defaults to True. True

Returns

Name Type Description
pd.Series Pandas Series of predicted values for the specified number of steps,
pd.Series indexed according to the prediction index constructed from the last window and the number of steps.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.preprocessing import RollingFeatures
>>> y = pd.Series(np.arange(30), name='y')
>>> exog = pd.DataFrame({'temp': np.random.randn(30)}, index=y.index)
>>> forecaster = ForecasterRecursive(
...     estimator=LinearRegression(),
...     lags=3,
...     window_features=[RollingFeatures(stats='mean', window_sizes=3)]
... )
>>> forecaster.fit(y=y, exog=exog)
>>> last_window = y.iloc[-3:]
>>> exog_future = pd.DataFrame({'temp': np.random.randn(5)}, index=pd.RangeIndex(start=30, stop=35))
>>> predictions = forecaster.predict(
...     steps=5, last_window=last_window, exog=exog_future, check_inputs=True
... )
>>> isinstance(predictions, pd.Series)
True

predict_bootstrapping

forecaster.recursive._forecaster_recursive.ForecasterRecursive.predict_bootstrapping(
    steps,
    last_window=None,
    exog=None,
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
)

Generate multiple forecasting predictions using a bootstrapping process. By sampling from a collection of past observed errors (the residuals), each iteration of bootstrapping generates a different set of predictions. See the References section for more information.

Parameters

Name Type Description Default
steps int | str | pd.Timestamp Number of steps to predict. - If steps is int, number of steps to predict. - If str or pandas Datetime, the prediction will be up to that date. required
last_window pd.Series | pd.DataFrame | None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored in self.last_window_ are used to calculate the initial predictors, and the predictions start right after training data. Defaults to None. None
exog pd.Series | pd.DataFrame | None Exogenous variable/s included as predictor/s. Defaults to None. None
n_boot int Number of bootstrapping iterations to perform when estimating prediction intervals. Defaults to 250. 250
use_in_sample_residuals bool If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. Defaults to True. True
use_binned_residuals bool If True, residuals are selected based on the predicted values (binned selection). If False, residuals are selected randomly. Defaults to True. True
random_state int Seed for the random number generator to ensure reproducibility. Defaults to 123. 123

Returns

Name Type Description
pd.DataFrame Pandas DataFrame with predictions generated by bootstrapping. Shape: (steps, n_boot).

Raises

Name Type Description
ValueError If steps is not an integer or a valid date.
ValueError If exog is missing or has invalid shape.
ValueError If n_boot is not a positive integer.
ValueError If use_in_sample_residuals=True and in_sample_residuals_ are not available.
ValueError If use_in_sample_residuals=False and out_sample_residuals_ are not available.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> rng = np.random.default_rng(123)
>>> y = pd.Series(rng.normal(size=100), name='y')
>>> forecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)
>>> _ = forecaster.fit(y=y)
>>> boot_preds = forecaster.predict_bootstrapping(steps=3, n_boot=5)
>>> boot_preds.shape
(3, 5)

References

.. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html

predict_dist

forecaster.recursive._forecaster_recursive.ForecasterRecursive.predict_dist(
    steps,
    distribution,
    last_window=None,
    exog=None,
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
)

Fit a given probability distribution for each step. After generating multiple forecasting predictions through a bootstrapping process, each step is fitted to the given distribution.

Parameters

Name Type Description Default
steps int | str | pd.Timestamp Number of steps to predict. - If steps is int, number of steps to predict. - If str or pandas Datetime, the prediction will be up to that date. required
distribution object A distribution object from scipy.stats with methods _pdf and fit. For example scipy.stats.norm. required
last_window pd.Series | pd.DataFrame | None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored inself.last_window_ are used to calculate the initial predictors, and the predictions start right after training data. None
exog pd.Series | pd.DataFrame | None Exogenous variable/s included as predictor/s. None
n_boot int Number of bootstrapping iterations to perform when estimating prediction intervals. 250
use_in_sample_residuals bool If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. True
use_binned_residuals bool If True, residuals are selected based on the predicted values (binned selection). If False, residuals are selected randomly. True
random_state int Seed for the random number generator to ensure reproducibility. 123

Returns

Name Type Description
pd.DataFrame Distribution parameters estimated for each step.

predict_interval

forecaster.recursive._forecaster_recursive.ForecasterRecursive.predict_interval(
    steps,
    last_window=None,
    exog=None,
    method='bootstrapping',
    interval=[5, 95],
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
)

Predict n steps ahead and estimate prediction intervals using either bootstrapping or conformal prediction methods. Refer to the References section for additional details on these methods.

Parameters

Name Type Description Default
steps int | str | pd.Timestamp Number of steps to predict. - If steps is int, number of steps to predict. - If str or pandas Datetime, the prediction will be up to that date. required
last_window pd.Series | pd.DataFrame | None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored in self.last_window_ are used to calculate the initial predictors, and the predictions start right after training data. Defaults to None. None
exog pd.Series | pd.DataFrame | None Exogenous variable/s included as predictor/s. Defaults to None. None
method str Technique used to estimate prediction intervals. Available options: - ‘bootstrapping’: Bootstrapping is used to generate prediction intervals [1]. - ‘conformal’: Employs the conformal prediction split method for interval estimation [2]. Defaults to ‘bootstrapping’. 'bootstrapping'
interval float | list[float] | tuple[float] Confidence level of the prediction interval. Interpretation depends on the method used: - If float, represents the nominal (expected) coverage (between 0 and 1). For instance, interval=0.95 corresponds to [2.5, 97.5] percentiles. - If list or tuple, defines the exact percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as interval = [2.5, 97.5]. - When using method='conformal', the interval must be a float or a list/tuple defining a symmetric interval. Defaults to [5, 95]. [5, 95]
n_boot int Number of bootstrapping iterations to perform when estimating prediction intervals. Defaults to 250. 250
use_in_sample_residuals bool If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. Defaults to True. True
use_binned_residuals bool If True, residuals are selected based on the predicted values (binned selection). If False, residuals are selected randomly. Defaults to True. True
random_state int Seed for the random number generator to ensure reproducibility. Defaults to 123. 123

Returns

Name Type Description
pd.DataFrame Pandas DataFrame with values predicted by the forecaster and their estimated interval.
pd.DataFrame - pred: predictions.
pd.DataFrame - lower_bound: lower bound of the interval.
pd.DataFrame - upper_bound: upper bound of the interval.

Raises

Name Type Description
ValueError If method is not ‘bootstrapping’ or ‘conformal’.
ValueError If interval is invalid or not compatible with the chosen method.
ValueError If inputs (steps, exog, etc.) are invalid.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> rng = np.random.default_rng(123)
>>> y = pd.Series(rng.normal(size=100), name='y')
>>> forecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)
>>> _ = forecaster.fit(y=y)
>>> # Bootstrapping method
>>> intervals_boot = forecaster.predict_interval(
...     steps=3, method='bootstrapping', interval=[5, 95]
... )
>>> intervals_boot.columns.tolist()
['pred', 'lower_bound', 'upper_bound']
>>> # Conformal method
>>> intervals_conf = forecaster.predict_interval(
...     steps=3, method='conformal', interval=0.95
... )
>>> intervals_conf.columns.tolist()
['pred', 'lower_bound', 'upper_bound']

References

.. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html .. [2] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

predict_quantiles

forecaster.recursive._forecaster_recursive.ForecasterRecursive.predict_quantiles(
    steps,
    last_window=None,
    exog=None,
    quantiles=[0.05, 0.5, 0.95],
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
)

Calculate the specified quantiles for each step. After generating multiple forecasting predictions through a bootstrapping process, each quantile is calculated for each step.

Parameters

Name Type Description Default
steps int | str | pd.Timestamp Number of steps to predict. - If steps is int, number of steps to predict. - If str or pandas Datetime, the prediction will be up to that date. required
last_window pd.Series | pd.DataFrame | None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If last_window = None, the values stored inself.last_window_ are used to calculate the initial predictors, and the predictions start right after training data. None
exog pd.Series | pd.DataFrame | None Exogenous variable/s included as predictor/s. None
quantiles list[float] | tuple[float] Sequence of quantiles to compute, which must be between 0 and 1 inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as quantiles = [0.05, 0.5, 0.95]. [0.05, 0.5, 0.95]
n_boot int Number of bootstrapping iterations to perform when estimating quantiles. 250
use_in_sample_residuals bool If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. True
use_binned_residuals bool If True, residuals are selected based on the predicted values (binned selection). If False, residuals are selected randomly. True
random_state int Seed for the random number generator to ensure reproducibility. 123

Returns

Name Type Description
pd.DataFrame Quantiles predicted by the forecaster.

set_fit_kwargs

forecaster.recursive._forecaster_recursive.ForecasterRecursive.set_fit_kwargs(
    fit_kwargs,
)

Set new values for the additional keyword arguments passed to the fit method of the estimator.

Parameters

Name Type Description Default
fit_kwargs dict[str, object] Dict of the form {“argument”: new_value}. required

set_in_sample_residuals

forecaster.recursive._forecaster_recursive.ForecasterRecursive.set_in_sample_residuals(
    y,
    exog=None,
    random_state=123,
)

Set in-sample residuals in case they were not calculated during the training process.

In-sample residuals are calculated as the difference between the true values and the predictions made by the forecaster using the training data. The following internal attributes are updated:

  • in_sample_residuals_: residuals stored in a numpy ndarray.
  • binner_intervals_: intervals used to bin the residuals are calculated using the quantiles of the predicted values.
  • in_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range.

A total of 10_000 residuals are stored in the attribute in_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.

Parameters

Name Type Description Default
y pd.Series Target time series. required
exog: Exogenous variables.
random_state: Random state for reproducibility.

Returns

Name Type Description
None None

Raises

Name Type Description
NotFittedError If the forecaster is not fitted.
IndexError If the index range of y does not match the range used during training.
ValueError If the features generated from the provided data do not match those used during the training process.

Examples

>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> forecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)
>>> forecaster.fit(y=pd.Series(np.arange(20)), store_in_sample_residuals=False)
>>> forecaster.set_in_sample_residuals(y=pd.Series(np.arange(20)))
>>> forecaster.in_sample_residuals_
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

set_lags

forecaster.recursive._forecaster_recursive.ForecasterRecursive.set_lags(
    lags=None,
)

Set new value to the attribute lags. Attributes lags_names, max_lag and window_size are also updated.

Parameters

Name Type Description Default
lags Union[int, List[int], np.ndarray, range, None] Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. - int: include lags from 1 to lags (included). - list, 1d numpy ndarray or range: include only lags present in lags, all elements must be int. - None: no lags are included as predictors. None

set_out_sample_residuals

forecaster.recursive._forecaster_recursive.ForecasterRecursive.set_out_sample_residuals(
    y_true,
    y_pred,
    append=False,
    random_state=123,
)

Set new values to the attribute out_sample_residuals_.

Out of sample residuals are meant to be calculated using observations that did not participate in the training process. y_true and y_pred are expected to be in the original scale of the time series. Residuals are calculated as y_true - y_pred, after applying the necessary transformations and differentiations if the forecaster includes them (self.transformer_y and self.differentiation). Two internal attributes are updated:

  • out_sample_residuals_: residuals stored in a numpy ndarray.
  • out_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range. If a bin is empty, it is filled with a random sample of residuals from other bins. This is done to ensure that all bins have at least one residual and can be used in the prediction process.

A total of 10_000 residuals are stored in the attribute out_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.

Parameters

Name Type Description Default
y_true np.ndarray | pd.Series True values of the time series in the original scale. required
y_pred np.ndarray | pd.Series Predicted values of the time series in the original scale. required
append bool If True, new residuals are added to the once already stored in the forecaster. If after appending the new residuals, the limit of 10_000 // self.binner.n_bins_ values per bin is reached, a random sample of residuals is stored. False
random_state int Random state for reproducibility. 123

Returns

Name Type Description
None None

Raises

Name Type Description
NotFittedError If the forecaster is not fitted.
TypeError If y_true or y_pred are not numpy ndarray or pandas Series.
ValueError If y_true and y_pred have different length or index (if Series).

Examples

>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> import pandas as pd
>>> import numpy as np
>>> forecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)
>>> forecaster.fit(y=pd.Series(np.arange(20)), store_in_sample_residuals=False)
>>> y_true = np.array([20, 21, 22, 23, 24])
>>> y_pred = np.array([20.1, 20.9, 22.2, 22.8, 24.0])
>>> forecaster.set_out_sample_residuals(y_true=y_true, y_pred=y_pred)
>>> forecaster.out_sample_residuals_
array([-0.1,  0.1, -0.2,  0.2,  0. ])

set_params

forecaster.recursive._forecaster_recursive.ForecasterRecursive.set_params(
    params=None,
    **kwargs,
)

Set the parameters of this forecaster.

Parameters

Name Type Description Default
params Dict[str, object] Optional dictionary of parameter names mapped to their new values. If provided, these parameters are set first. None
**kwargs object Dictionary of parameter names mapped to their new values. Parameters can be for the forecaster itself or for the contained estimator (using the estimator__ prefix). {}

Returns

Name Type Description
self 'ForecasterRecursive' The forecaster instance with updated parameters.

Examples

>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> forecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)
>>> forecaster.set_params(estimator__fit_intercept=False)
>>> forecaster.estimator.get_params()["fit_intercept"]
False

set_window_features

forecaster.recursive._forecaster_recursive.ForecasterRecursive.set_window_features(
    window_features=None,
)

Set new value to the attribute window_features.

Attributes max_size_window_features, window_features_names, window_features_class_names and window_size are also updated.

Parameters

Name Type Description Default
window_features object | list[object] | None Instance or list of instances used to create window features. Window features are created from the original time series and are included as predictors. None

Returns

Name Type Description
None None

Examples

>>> from sklearn.linear_model import LinearRegression
>>> from spotforecast2_safe.forecaster.recursive import ForecasterRecursive
>>> from spotforecast2_safe.preprocessing import RollingFeatures
>>> import pandas as pd
>>> import numpy as np
>>> forecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)
>>> rolling = RollingFeatures(stats=['mean', 'std'], window_sizes=[3, 5])
>>> forecaster.set_window_features(window_features=rolling)
>>> forecaster.window_features_names
['roll_mean_3', 'roll_std_3', 'roll_mean_5', 'roll_std_5']
>>> forecaster.window_size
5