splitter.split.split_rel_train_val_test

splitter.split.split_rel_train_val_test(
    data,
    perc_train,
    perc_val,
    verbose=False,
)

Splits a time series DataFrame into training, validation, and test sets by percentages.

The test percentage is computed as 1 - perc_train - perc_val. Sizes are rounded to ensure the splits sum to the full dataset size.

Parameters

Name Type Description Default
data pd.DataFrame The time series data with a DateTimeIndex. required
perc_train float Fraction of data used for training. required
perc_val float Fraction of data used for validation. required
verbose bool Whether to print additional information. False

Returns

Name Type Description
tuple tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame] A tuple containing: - data_train (pd.DataFrame): The training set. - data_val (pd.DataFrame): The validation set. - data_test (pd.DataFrame): The test set.

Examples

import numpy as np
import pandas as pd

from spotforecast2_safe.splitter.split import split_rel_train_val_test

rng = np.random.default_rng(0)
idx = pd.date_range("2020-01-01", periods=100, freq="h")
data = pd.DataFrame({"value": rng.standard_normal(100)}, index=idx)

data_train, data_val, data_test = split_rel_train_val_test(
    data,
    perc_train=0.8,
    perc_val=0.1,
    verbose=False,
)

# Sizes
assert len(data_train) == 80
assert len(data_val) == 10
assert len(data_test) == 10
print(f"Train: {len(data_train)}, Val: {len(data_val)}, Test: {len(data_test)}")

# Full index coverage: union of the three splits equals the original index
combined_index = data_train.index.append(data_val.index).append(data_test.index)
assert combined_index.equals(data.index), "Union of splits must equal original index"

# Temporal ordering: train ends before val, val ends before test
assert data_train.index.max() < data_val.index.min()
assert data_val.index.max() < data_test.index.min()
print(f"Train ends: {data_train.index.max()}")
print(f"Val  starts: {data_val.index.min()}, ends: {data_val.index.max()}")
print(f"Test starts: {data_test.index.min()}")
Train: 80, Val: 10, Test: 10
Train ends: 2020-01-04 07:00:00
Val  starts: 2020-01-04 08:00:00, ends: 2020-01-04 17:00:00
Test starts: 2020-01-04 18:00:00