splitter.split_one_step

splitter.split_one_step

One step ahead cross-validation splitting.

Classes

Name Description
OneStepAheadFold Class to split time series data into train and test folds for one-step-ahead

OneStepAheadFold

splitter.split_one_step.OneStepAheadFold(
    initial_train_size,
    window_size=None,
    differentiation=None,
    return_all_indexes=False,
    verbose=True,
)

Class to split time series data into train and test folds for one-step-ahead forecasting.

Parameters

Name Type Description Default
initial_train_size int | str | pd.Timestamp Number of observations used for initial training. - If an integer, the number of observations used for initial training. - If a date string or pandas Timestamp, it is the last date included in the initial training set. required
window_size int Number of observations needed to generate the autoregressive predictors. Defaults to None. None
differentiation int Number of observations to use for differentiation. This is used to extend the last_window as many observations as the differentiation order. Defaults to None. None
return_all_indexes bool Whether to return all indexes or only the start and end indexes of each fold. Defaults to False. False
verbose bool Whether to print information about generated folds. Defaults to True. True

Attributes

Name Type Description
initial_train_size int Number of observations used for initial training.
window_size int Number of observations needed to generate the autoregressive predictors.
differentiation int Number of observations to use for differentiation. This is used to extend the last_window as many observations as the differentiation order.
return_all_indexes bool Whether to return all indexes or only the start and end indexes of each fold.
verbose bool Whether to print information about generated folds.

Examples

import numpy as np
import pandas as pd
from spotforecast2_safe.splitter.split_one_step import OneStepAheadFold

rng = np.random.default_rng(0)
idx = pd.date_range("2025-01-01", periods=120, freq="h", tz="UTC")
y = pd.Series(
    50 + 10 * np.sin(np.arange(120) / 12) + rng.normal(0, 1, 120),
    index=idx,
    name="load",
)

cv = OneStepAheadFold(initial_train_size=96, verbose=False)
fold = cv.split(y, as_pandas=True)
print(fold)
assert fold["train_end"].iloc[0] == 96
assert fold["test_start"].iloc[0] == 96
assert fold["test_end"].iloc[0] == 120
   fold  train_start  train_end  test_start  test_end  fit_forecaster
0     0            0         96          96       120            True

Methods

Name Description
split Split the time series data into train and test folds.
split
splitter.split_one_step.OneStepAheadFold.split(
    X,
    as_pandas=False,
    externally_fitted=None,
)

Split the time series data into train and test folds.

Parameters
Name Type Description Default
X pd.Series | pd.DataFrame | pd.Index | dict Time series data or index to split. required
as_pandas bool If True, the folds are returned as a DataFrame. This is useful to visualize the folds in a more interpretable way. Defaults to False. False
externally_fitted Any This argument is not used in this class. It is included for API consistency. Defaults to None. None
Returns
Name Type Description
list | pd.DataFrame list | pd.DataFrame: A list of lists containing the indices (position) of
list | pd.DataFrame the fold. The list contains 2 lists with the following information:
list | pd.DataFrame - fold: fold number.
list | pd.DataFrame - [train_start, train_end]: list with the start and end positions of the training set.
list | pd.DataFrame - [test_start, test_end]: list with the start and end positions of the test set. These are the observations used to evaluate the forecaster.
list | pd.DataFrame - fit_forecaster: boolean indicating whether the forecaster should be fitted in this fold.
list | pd.DataFrame It is important to note that the returned values are the positions of the
list | pd.DataFrame observations and not the actual values of the index, so they can be used to
list | pd.DataFrame slice the data directly using iloc.
list | pd.DataFrame If as_pandas is True, the folds are returned as a DataFrame with the
list | pd.DataFrame following columns: ‘fold’, ‘train_start’, ‘train_end’, ‘test_start’,
list | pd.DataFrame ‘test_end’, ‘fit_forecaster’.
list | pd.DataFrame Following the python convention, the start index is inclusive and the end
list | pd.DataFrame index is exclusive. This means that the last index is not included in the
list | pd.DataFrame slice.
Examples
import numpy as np
import pandas as pd
from spotforecast2_safe.splitter.split_one_step import OneStepAheadFold

rng = np.random.default_rng(0)
idx = pd.date_range("2025-01-01", periods=100, freq="h", tz="UTC")
y = pd.Series(
    50 + 10 * np.sin(np.arange(100) / 12) + rng.normal(0, 1, 100),
    index=idx,
    name="load",
)

cv = OneStepAheadFold(initial_train_size=80, verbose=False)

# List form: [fold_id, [train_start, train_end], [test_start, test_end], fit]
fold_list = cv.split(y)
print("fold list:", fold_list)
assert fold_list[1] == [0, 80]
assert fold_list[2] == [80, 100]

# DataFrame form for human-readable inspection
fold_df = cv.split(y, as_pandas=True)
print(fold_df)
assert fold_df.shape == (1, 6)
assert int(fold_df["train_end"].iloc[0]) == 80
assert int(fold_df["test_end"].iloc[0]) == 100
fold list: [0, [0, 80], [80, 100], True]
   fold  train_start  train_end  test_start  test_end  fit_forecaster
0     0            0         80          80       100            True