This forecaster predicts future values based on the most recent equivalent date. It also allows to aggregate multiple past values of the equivalent date using a function (e.g. mean, median, max, min, etc.). The equivalent date is calculated by moving back in time a specified number of steps (offset). The offset can be defined as an integer or as a pandas DateOffset. This approach is useful as a baseline, but it is a simplistic method and may not capture complex underlying patterns.
Number of steps to go back in time to find the most recent equivalent date to the target period. If offset is an integer, it represents the number of steps to go back in time. For example, if the frequency of the time series is daily, offset = 7 means that the most recent data similar to the target period is the value observed 7 days ago. Pandas DateOffsets can also be used to move forward a given number of valid dates. For example, Bday(2) can be used to move back two business days. If the date does not start on a valid date, it is first moved to a valid date. For example, if the date is a Saturday, it is moved to the previous Friday. Then, the offset is applied. If the result is a non-valid date, it is moved to the next valid date. For example, if the date is a Sunday, it is moved to the next Monday. For more information about offsets, see https://pandas.pydata.org/docs/reference/offset_frequency.html.
Number of equivalent dates (multiple of offset) used in the prediction. Defaults to 1. If n_offsets is greater than 1, the values at the equivalent dates are aggregated using the agg_func function. For example, if the frequency of the time series is daily, offset = 7, n_offsets = 2 and agg_func = np.mean, the predicted value will be the mean of the values observed 7 and 14 days ago.
Additional arguments to pass to the QuantileBinner used to discretize the residuals into k bins according to the predicted values associated with each residual. Available arguments are: n_bins, method, subsample, random_state and dtype. Argument method is passed internally to the function numpy.percentile. Defaults to None.
Number of past values needed to include the last equivalent dates according to the offset and n_offsets.
last_window_
pandas Series
This window represents the most recent data observed by the predictor during its training phase. It contains the past values needed to include the last equivalent date according the offset and n_offsets.
In sample residuals binned according to the predicted value each residual is associated with. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_ in the form {bin: residuals}.
out_sample_residuals_
numpy ndarray
Residuals of the model when predicting non-training data. Only stored up to 10_000 values. Use set_out_sample_residuals() method to set values.
Out of sample residuals binned according to the predicted value each residual is associated with. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_ in the form {bin: residuals}.
binner
spotforecast.preprocessing.QuantileBinner
QuantileBinner used to discretize residuals into k bins according to the predicted values associated with each residual.
Not used, present here for API consistency by convention.
differentiation
Ignored
Not used, present here for API consistency by convention.
differentiation_max
Ignored
Not used, present here for API consistency by convention.
Examples
>>>import pandas as pd>>>import numpy as np>>>from spotforecast2_safe.forecaster.recursive import ForecasterEquivalentDate>>># Series with daily frequency>>> data = pd.Series(... data = np.arange(14),... index = pd.date_range(start='2022-01-01', periods=14, freq='D')... )>>># Forecast based on the value 7 days ago>>> forecaster = ForecasterEquivalentDate(offset=7)>>> forecaster.fit(y=data)>>> forecaster.predict(steps=3)2022-01-1572022-01-1682022-01-179Freq: D, Name: pred, dtype: int64
If True, in-sample residuals will be stored in the forecaster object after fitting (in_sample_residuals_ and in_sample_residuals_by_bin_ attributes). If False, only the intervals of the bins are stored. Defaults to False.
Past values needed to select the last equivalent dates according to the offset. If last_window = None, the values stored in self.last_window_ are used and the predictions start immediately after the training data. Defaults to None.
If True, the input is checked for possible warnings and errors with the check_predict_input function. This argument is created for internal use and is not recommended to be changed. Defaults to True.
True
exog
Ignored
Not used, present here for API consistency by convention.
If all equivalent values are missing when using a pandas DateOffset as offset. This can be caused by using an offset larger than the available data. To avoid this, try to decrease the size of the offset, the number of n_offsets or increase the size of last_window. In backtesting, this error may be caused by using an initial_train_size too small.
If some equivalent values are missing when using a pandas DateOffset as offset. This can be caused by using an offset larger than the available data or by using an initial_train_size too small in backtesting. To avoid this, increase the last_window size or decrease the number of n_offsets.
Examples
>>>import pandas as pd>>>import numpy as np>>>from spotforecast2_safe.forecaster.recursive import ForecasterEquivalentDate>>> data = pd.Series(... data = np.arange(14),... index = pd.date_range(start='2022-01-01', periods=14, freq='D')... )>>> forecaster = ForecasterEquivalentDate(offset=7)>>> forecaster.fit(y=data)>>> forecaster.predict(steps=3)2022-01-1572022-01-1682022-01-179Freq: D, Name: pred, dtype: int64
Predict n steps ahead and estimate prediction intervals using conformal prediction method. Refer to the References section for additional details on this method.
Past values needed to select the last equivalent dates according to the offset. If last_window = None, the values stored in self.last_window_ are used and the predictions start immediately after the training data. Defaults to None.
Technique used to estimate prediction intervals. Available options: - ‘conformal’: Employs the conformal prediction split method for interval estimation [1]_. Defaults to ‘conformal’.
Confidence level of the prediction interval. Interpretation depends on the method used: - If float, represents the nominal (expected) coverage (between 0 and 1). For instance, interval=0.95 corresponds to [2.5, 97.5] percentiles. - If list or tuple, defines the exact percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as interval = [2.5, 97.5]. - When using method='conformal', the interval must be a float or a list/tuple defining a symmetric interval. Defaults to [5, 95].
If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster’s set_out_sample_residuals() method. Defaults to True.
pd.DataFrame: Values predicted by the forecaster and their estimated interval. - pred: predictions. - lower_bound: lower bound of the interval. - upper_bound: upper bound of the interval.
If interval is not a float or a list/tuple defining a symmetric interval when using method='conformal'.
References
.. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method
Examples
>>>import pandas as pd>>>import numpy as np>>>from spotforecast2_safe.forecaster.recursive import ForecasterEquivalentDate>>> data = pd.Series(... data = np.arange(14, dtype=float),... index = pd.date_range(start='2022-01-01', periods=14, freq='D')... )>>> forecaster = ForecasterEquivalentDate(offset=7)>>> forecaster.fit(y=data, store_in_sample_residuals=True)>>> forecaster.predict_interval(steps=3, interval=0.8) pred lower_bound upper_bound2022-01-157.06.08.02022-01-168.07.09.02022-01-179.08.010.0
set_in_sample_residuals
forecaster.recursive._forecaster_equivalent_date.ForecasterEquivalentDate.set_in_sample_residuals( y, random_state=123, exog=None,)
Set in-sample residuals in case they were not calculated during the training process.
In-sample residuals are calculated as the difference between the true values and the predictions made by the forecaster using the training data. The following internal attributes are updated:
in_sample_residuals_: residuals stored in a numpy ndarray.
binner_intervals_: intervals used to bin the residuals are calculated using the quantiles of the predicted values.
in_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range.
A total of 10_000 residuals are stored in the attribute in_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.
Set new values to the attribute out_sample_residuals_. Out of sample residuals are meant to be calculated using observations that did not participate in the training process. Two internal attributes are updated:
out_sample_residuals_: residuals stored in a numpy ndarray.
out_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range. If a bin binning is empty, it is filled with a random sample of residuals from other bins. This is done to ensure that all bins have at least one residual and can be used in the prediction process.
A total of 10_000 residuals are stored in the attribute out_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.
Parameters
Name
Type
Description
Default
y_true
numpy ndarray, pandas Series
True values of the time series from which the residuals have been calculated.
If True, new residuals are added to the once already stored in the forecaster. If after appending the new residuals, the limit of 10_000 // self.binner.n_bins_ values per bin is reached, a random sample of residuals is stored. Defaults to False.