splitter.split.split_rel_train_val_test(
data,
perc_train,
perc_val,
verbose= False ,
)
Splits a time series DataFrame into training, validation, and test sets by percentages.
The test percentage is computed as 1 - perc_train - perc_val. Sizes are rounded to ensure the splits sum to the full dataset size.
Parameters
data
pd .DataFrame
The time series data with a DateTimeIndex.
required
perc_train
float
Fraction of data used for training.
required
perc_val
float
Fraction of data used for validation.
required
verbose
bool
Whether to print additional information.
False
Returns
tuple
tuple [pd .DataFrame , pd .DataFrame , pd .DataFrame ]
A tuple containing: - data_train (pd.DataFrame): The training set. - data_val (pd.DataFrame): The validation set. - data_test (pd.DataFrame): The test set.
Examples
import numpy as np
import pandas as pd
from spotforecast2_safe.splitter.split import split_rel_train_val_test
rng = np.random.default_rng(0 )
idx = pd.date_range("2022-01-01" , periods= 100 , freq= "D" )
data = pd.DataFrame({"value" : rng.standard_normal(100 )}, index= idx)
data_train, data_val, data_test = split_rel_train_val_test(
data,
perc_train= 0.7 ,
perc_val= 0.2 ,
verbose= True ,
)
assert len (data_train) == 70
assert len (data_val) == 20
assert len (data_test) == 10
print (f"Train: { len (data_train)} , Val: { len (data_val)} , Test: { len (data_test)} " )
Splitting data into train/val/test with percentages: 70.0000% / 20.0000% / 10.0000%
Train size: 70 (70.00%)
Val size: 20 (20.00%)
Test size: 10 (10.00%)
Train: 70, Val: 20, Test: 10