splitter.split.split_rel_train_val_test

splitter.split.split_rel_train_val_test(
    data,
    perc_train,
    perc_val,
    verbose=False,
)

Splits a time series DataFrame into training, validation, and test sets by percentages.

The test percentage is computed as 1 - perc_train - perc_val. Sizes are rounded to ensure the splits sum to the full dataset size.

Parameters

Name Type Description Default
data pd.DataFrame The time series data with a DateTimeIndex. required
perc_train float Fraction of data used for training. required
perc_val float Fraction of data used for validation. required
verbose bool Whether to print additional information. False

Returns

Name Type Description
tuple tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame] A tuple containing: - data_train (pd.DataFrame): The training set. - data_val (pd.DataFrame): The validation set. - data_test (pd.DataFrame): The test set.

Examples

import numpy as np
import pandas as pd

from spotforecast2_safe.splitter.split import split_rel_train_val_test

rng = np.random.default_rng(0)
idx = pd.date_range("2022-01-01", periods=100, freq="D")
data = pd.DataFrame({"value": rng.standard_normal(100)}, index=idx)

data_train, data_val, data_test = split_rel_train_val_test(
    data,
    perc_train=0.7,
    perc_val=0.2,
    verbose=True,
)
assert len(data_train) == 70
assert len(data_val) == 20
assert len(data_test) == 10
print(f"Train: {len(data_train)}, Val: {len(data_val)}, Test: {len(data_test)}")
Splitting data into train/val/test with percentages: 70.0000% / 20.0000% / 10.0000%
Train size: 70 (70.00%)
Val size: 20 (20.00%)
Test size: 10 (10.00%)
Train: 70, Val: 20, Test: 10