splitter.split.split_abs_train_val_test(
data,
end_train,
end_validation,
verbose= False ,
)
Splits a time series DataFrame into training, validation, and test sets based on absolute timestamps.
Parameters
data
pd .DataFrame
The time series data with a DateTimeIndex.
required
end_train
pd .Timestamp
The end date for the training set.
required
end_validation
pd .Timestamp
The end date for the validation set.
required
Returns
tuple
tuple [pd .DataFrame , pd .DataFrame , pd .DataFrame ]
A tuple containing: - data_train (pd.DataFrame): The training set. - data_val (pd.DataFrame): The validation set. - data_test (pd.DataFrame): The test set.
Examples
import numpy as np
import pandas as pd
from spotforecast2_safe.splitter.split import split_abs_train_val_test
rng = np.random.default_rng(0 )
idx = pd.date_range("2022-01-01" , periods= 100 , freq= "D" )
data = pd.DataFrame({"value" : rng.standard_normal(100 )}, index= idx)
end_train = pd.Timestamp("2022-02-28" )
end_validation = pd.Timestamp("2022-03-31" )
data_train, data_val, data_test = split_abs_train_val_test(
data,
end_train= end_train,
end_validation= end_validation,
verbose= True ,
)
assert data_train.index.max () == end_train
assert data_val.index.max () == end_validation
assert data_test.index.min () == end_validation
print (f"Train: { len (data_train)} , Val: { len (data_val)} , Test: { len (data_test)} " )
Start date: 2022-01-01 00:00:00
End date: 2022-04-10 00:00:00
Train: 2022-01-01 00:00:00 --- 2022-02-28 00:00:00 (n=59)
Val: 2022-02-28 00:00:00 --- 2022-03-31 00:00:00 (n=32)
Test: 2022-03-31 00:00:00 --- 2022-04-10 00:00:00 (n=11)
Train: 59, Val: 32, Test: 11