preprocessing.split

preprocessing.split

Functions

Name Description
split_abs_train_val_test Splits a time series DataFrame into training, validation, and test sets based on absolute timestamps.
split_rel_train_val_test Splits a time series DataFrame into training, validation, and test sets by percentages.

split_abs_train_val_test

preprocessing.split.split_abs_train_val_test(
    data,
    end_train,
    end_validation,
    verbose=False,
)

Splits a time series DataFrame into training, validation, and test sets based on absolute timestamps.

Parameters

Name Type Description Default
data pd.DataFrame The time series data with a DateTimeIndex. required
end_train pd.Timestamp The end date for the training set. required
end_validation pd.Timestamp The end date for the validation set. required

Returns

Name Type Description
tuple tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame] A tuple containing: - data_train (pd.DataFrame): The training set. - data_val (pd.DataFrame): The validation set. - data_test (pd.DataFrame): The test set.

Examples

>>> from spotforecast2_safe.data.fetch_data import fetch_data, get_data_home
>>> from spotforecast2.preprocessing.split import split_train_val_test
>>> data = fetch_data(filename=get_data_home() / "data_in.csv")
>>> end_train = pd.Timestamp('2020-12-31 23:00:00')
>>> end_validation = pd.Timestamp('2021-06-30 23:00:00')
>>> data_train, data_val, data_test = split_train_val_test(
...     data,
...     end_train=end_train,
...     end_validation=end_validation,
...     verbose=True
... )

split_rel_train_val_test

preprocessing.split.split_rel_train_val_test(
    data,
    perc_train,
    perc_val,
    verbose=False,
)

Splits a time series DataFrame into training, validation, and test sets by percentages.

The test percentage is computed as 1 - perc_train - perc_val. Sizes are rounded to ensure the splits sum to the full dataset size.

Parameters

Name Type Description Default
data pd.DataFrame The time series data with a DateTimeIndex. required
perc_train float Fraction of data used for training. required
perc_val float Fraction of data used for validation. required
verbose bool Whether to print additional information. False

Returns

Name Type Description
tuple tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame] A tuple containing: - data_train (pd.DataFrame): The training set. - data_val (pd.DataFrame): The validation set. - data_test (pd.DataFrame): The test set.

Examples

>>> from spotforecast2_safe.data.fetch_data import fetch_data, get_data_home
>>> from spotforecast2.preprocessing.split import split_rel_train_val_test
>>> data = fetch_data(filename=get_data_home() / "data_in.csv")
>>> data_train, data_val, data_test = split_rel_train_val_test(
...     data,
...     perc_train=0.7,
...     perc_val=0.2,
...     verbose=True
... )