splitter.split.split_abs_train_val_test

splitter.split.split_abs_train_val_test(
    data,
    end_train,
    end_validation,
    verbose=False,
)

Splits a time series DataFrame into training, validation, and test sets based on absolute timestamps.

Parameters

Name Type Description Default
data pd.DataFrame The time series data with a DateTimeIndex. required
end_train pd.Timestamp The end date for the training set. required
end_validation pd.Timestamp The end date for the validation set. required

Returns

Name Type Description
tuple tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame] A tuple containing: - data_train (pd.DataFrame): The training set. - data_val (pd.DataFrame): The validation set. - data_test (pd.DataFrame): The test set.

Examples

import numpy as np
import pandas as pd

from spotforecast2_safe.splitter.split import split_abs_train_val_test

rng = np.random.default_rng(0)
idx = pd.date_range("2022-01-01", periods=100, freq="D")
data = pd.DataFrame({"value": rng.standard_normal(100)}, index=idx)

end_train = pd.Timestamp("2022-02-28")
end_validation = pd.Timestamp("2022-03-31")
data_train, data_val, data_test = split_abs_train_val_test(
    data,
    end_train=end_train,
    end_validation=end_validation,
    verbose=True,
)
assert data_train.index.max() == end_train
assert data_val.index.max() == end_validation
assert data_test.index.min() == end_validation
print(f"Train: {len(data_train)}, Val: {len(data_val)}, Test: {len(data_test)}")
Start date: 2022-01-01 00:00:00
End date: 2022-04-10 00:00:00
Train: 2022-01-01 00:00:00 --- 2022-02-28 00:00:00  (n=59)
Val: 2022-02-28 00:00:00 --- 2022-03-31 00:00:00  (n=32)
Test: 2022-03-31 00:00:00 --- 2022-04-10 00:00:00  (n=11)
Train: 59, Val: 32, Test: 11