splitter.split_one_step

splitter.split_one_step

One step ahead cross-validation splitting.

Classes

Name	Description
OneStepAheadFold	Class to split time series data into train and test folds for one-step-ahead

OneStepAheadFold

splitter.split_one_step.OneStepAheadFold(
    initial_train_size,
    window_size=None,
    differentiation=None,
    return_all_indexes=False,
    verbose=True,
)

Class to split time series data into train and test folds for one-step-ahead forecasting.

Parameters

Name	Type	Description	Default
initial_train_size	int \| str \| pd.Timestamp	Number of observations used for initial training. - If an integer, the number of observations used for initial training. - If a date string or pandas Timestamp, it is the last date included in the initial training set.	required
window_size	int	Number of observations needed to generate the autoregressive predictors. Defaults to None.	`None`
differentiation	int	Number of observations to use for differentiation. This is used to extend the `last_window` as many observations as the differentiation order. Defaults to None.	`None`
return_all_indexes	bool	Whether to return all indexes or only the start and end indexes of each fold. Defaults to False.	`False`
verbose	bool	Whether to print information about generated folds. Defaults to True.	`True`

Attributes

Name	Type	Description
initial_train_size	int	Number of observations used for initial training.
window_size	int	Number of observations needed to generate the autoregressive predictors.
differentiation	int	Number of observations to use for differentiation. This is used to extend the `last_window` as many observations as the differentiation order.
return_all_indexes	bool	Whether to return all indexes or only the start and end indexes of each fold.
verbose	bool	Whether to print information about generated folds.

Examples

import numpy as np
import pandas as pd
from spotforecast2_safe.splitter.split_one_step import OneStepAheadFold

rng = np.random.default_rng(0)
idx = pd.date_range("2025-01-01", periods=120, freq="h", tz="UTC")
y = pd.Series(
    50 + 10 * np.sin(np.arange(120) / 12) + rng.normal(0, 1, 120),
    index=idx,
    name="load",
)

cv = OneStepAheadFold(initial_train_size=96, verbose=False)
fold = cv.split(y, as_pandas=True)
print(fold)
assert fold["train_end"].iloc[0] == 96
assert fold["test_start"].iloc[0] == 96
assert fold["test_end"].iloc[0] == 120

   fold  train_start  train_end  test_start  test_end  fit_forecaster
0     0            0         96          96       120            True

Methods

Name	Description
set_params	Set the parameters of the Fold object. Before overwriting the current
split	Split the time series data into train and test folds.

set_params

splitter.split_one_step.OneStepAheadFold.set_params(params)

Set the parameters of the Fold object. Before overwriting the current parameters, the input parameters are validated to ensure correctness.

Parameters

Name	Type	Description	Default
params	dict	Dictionary with the parameters to set.	required

Examples

from spotforecast2_safe.splitter import TimeSeriesFold

cv = TimeSeriesFold(steps=1)
cv.set_params({
    "steps": 2,
    "initial_train_size": 10,
    "fold_stride": 2,
    "window_size": 5,
    "differentiation": 1,
    "refit": True,
    "fixed_train_size": False,
    "gap": 1,
    "skip_folds": 2,
    "allow_incomplete_fold": False,
    "return_all_indexes": True,
    "verbose": False,
})
assert cv.initial_train_size == 10
assert cv.window_size == 5

split

splitter.split_one_step.OneStepAheadFold.split(
    X,
    as_pandas=False,
    externally_fitted=None,
)

Split the time series data into train and test folds.

Parameters

Name	Type	Description	Default
X	pd.Series \| pd.DataFrame \| pd.Index \| dict	Time series data or index to split.	required
as_pandas	bool	If True, the folds are returned as a DataFrame. This is useful to visualize the folds in a more interpretable way. Defaults to False.	`False`
externally_fitted	Any	This argument is not used in this class. It is included for API consistency. Defaults to None.	`None`

Returns

Name	Type	Description
	list \| pd.DataFrame	list \| pd.DataFrame: A list of lists containing the indices (position) of
	list \| pd.DataFrame	the fold. The list contains 2 lists with the following information:
	list \| pd.DataFrame	- fold: fold number.
	list \| pd.DataFrame	- [train_start, train_end]: list with the start and end positions of the training set.
	list \| pd.DataFrame	- [test_start, test_end]: list with the start and end positions of the test set. These are the observations used to evaluate the forecaster.
	list \| pd.DataFrame	- fit_forecaster: boolean indicating whether the forecaster should be fitted in this fold.
	list \| pd.DataFrame	It is important to note that the returned values are the positions of the
	list \| pd.DataFrame	observations and not the actual values of the index, so they can be used to
	list \| pd.DataFrame	slice the data directly using iloc.
	list \| pd.DataFrame	If `as_pandas` is `True`, the folds are returned as a DataFrame with the
	list \| pd.DataFrame	following columns: ‘fold’, ‘train_start’, ‘train_end’, ‘test_start’,
	list \| pd.DataFrame	‘test_end’, ‘fit_forecaster’.
	list \| pd.DataFrame	Following the python convention, the start index is inclusive and the end
	list \| pd.DataFrame	index is exclusive. This means that the last index is not included in the
	list \| pd.DataFrame	slice.

Examples

import numpy as np
import pandas as pd
from spotforecast2_safe.splitter.split_one_step import OneStepAheadFold

rng = np.random.default_rng(0)
idx = pd.date_range("2025-01-01", periods=100, freq="h", tz="UTC")
y = pd.Series(
    50 + 10 * np.sin(np.arange(100) / 12) + rng.normal(0, 1, 100),
    index=idx,
    name="load",
)

cv = OneStepAheadFold(initial_train_size=80, verbose=False)

# List form: [fold_id, [train_start, train_end], [test_start, test_end], fit]
fold_list = cv.split(y)
print("fold list:", fold_list)
assert fold_list[1] == [0, 80]
assert fold_list[2] == [80, 100]

# DataFrame form for human-readable inspection
fold_df = cv.split(y, as_pandas=True)
print(fold_df)
assert fold_df.shape == (1, 6)
assert int(fold_df["train_end"].iloc[0]) == 80
assert int(fold_df["test_end"].iloc[0]) == 100

fold list: [0, [0, 80], [80, 100], True]
   fold  train_start  train_end  test_start  test_end  fit_forecaster
0     0            0         80          80       100            True