model_selection.split_one_step
model_selection.split_one_step
One step ahead cross-validation splitting.
Classes
OneStepAheadFold
Class to split time series data into train and test folds for one-step-ahead
OneStepAheadFold
model_selection.split_one_step.OneStepAheadFold(
initial_train_size,
window_size= None ,
differentiation= None ,
return_all_indexes= False ,
verbose= True ,
)
Class to split time series data into train and test folds for one-step-ahead forecasting.
Parameters
initial_train_size
int | str | pd .Timestamp
Number of observations used for initial training. - If an integer, the number of observations used for initial training. - If a date string or pandas Timestamp, it is the last date included in the initial training set.
required
window_size
int
Number of observations needed to generate the autoregressive predictors. Defaults to None.
None
differentiation
int
Number of observations to use for differentiation. This is used to extend the last_window as many observations as the differentiation order. Defaults to None.
None
return_all_indexes
bool
Whether to return all indexes or only the start and end indexes of each fold. Defaults to False.
False
verbose
bool
Whether to print information about generated folds. Defaults to True.
True
Attributes
initial_train_size
int
Number of observations used for initial training.
window_size
int
Number of observations needed to generate the autoregressive predictors.
differentiation
int
Number of observations to use for differentiation. This is used to extend the last_window as many observations as the differentiation order.
return_all_indexes
bool
Whether to return all indexes or only the start and end indexes of each fold.
verbose
bool
Whether to print information about generated folds.
Methods
split
Split the time series data into train and test folds.
split
model_selection.split_one_step.OneStepAheadFold.split(
X,
as_pandas= False ,
externally_fitted= None ,
)
Split the time series data into train and test folds.
Parameters
X
pd .Series | pd .DataFrame | pd .Index | dict
Time series data or index to split.
required
as_pandas
bool
If True, the folds are returned as a DataFrame. This is useful to visualize the folds in a more interpretable way. Defaults to False.
False
externally_fitted
Any
This argument is not used in this class. It is included for API consistency. Defaults to None.
None
Returns
list | pd .DataFrame
list | pd.DataFrame: A list of lists containing the indices (position) of
list | pd .DataFrame
the fold. The list contains 2 lists with the following information:
list | pd .DataFrame
- fold: fold number.
list | pd .DataFrame
- [train_start, train_end]: list with the start and end positions of the training set.
list | pd .DataFrame
- [test_start, test_end]: list with the start and end positions of the test set. These are the observations used to evaluate the forecaster.
list | pd .DataFrame
- fit_forecaster: boolean indicating whether the forecaster should be fitted in this fold.
list | pd .DataFrame
It is important to note that the returned values are the positions of the
list | pd .DataFrame
observations and not the actual values of the index, so they can be used to
list | pd .DataFrame
slice the data directly using iloc.
list | pd .DataFrame
If as_pandas is True, the folds are returned as a DataFrame with the
list | pd .DataFrame
following columns: ‘fold’, ‘train_start’, ‘train_end’, ‘test_start’,
list | pd .DataFrame
‘test_end’, ‘fit_forecaster’.
list | pd .DataFrame
Following the python convention, the start index is inclusive and the end
list | pd .DataFrame
index is exclusive. This means that the last index is not included in the
list | pd .DataFrame
slice.