model_selection.split_one_step

model_selection.split_one_step

One step ahead cross-validation splitting.

Classes

Name Description
OneStepAheadFold Class to split time series data into train and test folds for one-step-ahead

OneStepAheadFold

model_selection.split_one_step.OneStepAheadFold(
    initial_train_size,
    window_size=None,
    differentiation=None,
    return_all_indexes=False,
    verbose=True,
)

Class to split time series data into train and test folds for one-step-ahead forecasting.

Parameters

Name Type Description Default
initial_train_size int | str | pd.Timestamp Number of observations used for initial training. - If an integer, the number of observations used for initial training. - If a date string or pandas Timestamp, it is the last date included in the initial training set. required
window_size int Number of observations needed to generate the autoregressive predictors. Defaults to None. None
differentiation int Number of observations to use for differentiation. This is used to extend the last_window as many observations as the differentiation order. Defaults to None. None
return_all_indexes bool Whether to return all indexes or only the start and end indexes of each fold. Defaults to False. False
verbose bool Whether to print information about generated folds. Defaults to True. True

Attributes

Name Type Description
initial_train_size int Number of observations used for initial training.
window_size int Number of observations needed to generate the autoregressive predictors.
differentiation int Number of observations to use for differentiation. This is used to extend the last_window as many observations as the differentiation order.
return_all_indexes bool Whether to return all indexes or only the start and end indexes of each fold.
verbose bool Whether to print information about generated folds.

Methods

Name Description
split Split the time series data into train and test folds.
split
model_selection.split_one_step.OneStepAheadFold.split(
    X,
    as_pandas=False,
    externally_fitted=None,
)

Split the time series data into train and test folds.

Parameters
Name Type Description Default
X pd.Series | pd.DataFrame | pd.Index | dict Time series data or index to split. required
as_pandas bool If True, the folds are returned as a DataFrame. This is useful to visualize the folds in a more interpretable way. Defaults to False. False
externally_fitted Any This argument is not used in this class. It is included for API consistency. Defaults to None. None
Returns
Name Type Description
list | pd.DataFrame list | pd.DataFrame: A list of lists containing the indices (position) of
list | pd.DataFrame the fold. The list contains 2 lists with the following information:
list | pd.DataFrame - fold: fold number.
list | pd.DataFrame - [train_start, train_end]: list with the start and end positions of the training set.
list | pd.DataFrame - [test_start, test_end]: list with the start and end positions of the test set. These are the observations used to evaluate the forecaster.
list | pd.DataFrame - fit_forecaster: boolean indicating whether the forecaster should be fitted in this fold.
list | pd.DataFrame It is important to note that the returned values are the positions of the
list | pd.DataFrame observations and not the actual values of the index, so they can be used to
list | pd.DataFrame slice the data directly using iloc.
list | pd.DataFrame If as_pandas is True, the folds are returned as a DataFrame with the
list | pd.DataFrame following columns: ‘fold’, ‘train_start’, ‘train_end’, ‘test_start’,
list | pd.DataFrame ‘test_end’, ‘fit_forecaster’.
list | pd.DataFrame Following the python convention, the start index is inclusive and the end
list | pd.DataFrame index is exclusive. This means that the last index is not included in the
list | pd.DataFrame slice.