preprocessing.data_transform

preprocessing.data_transform

Data transformation utilities for time series forecasting.

This module provides functions for normalizing and transforming data formats.

Functions

Name Description
date_to_index_position Transform a datetime string or pandas Timestamp to an integer. The integer
expand_index Create a new index extending from the end of the original index.
input_to_frame Convert input data to a pandas DataFrame.
transform_dataframe Transform raw values of pandas DataFrame with a scikit-learn alike

date_to_index_position

preprocessing.data_transform.date_to_index_position(
    index,
    date_input,
    method='prediction',
    date_literal='steps',
    kwargs_pd_to_datetime=None,
)

Transform a datetime string or pandas Timestamp to an integer. The integer represents the position of the datetime in the index.

Parameters

Name Type Description Default
index pd.Index Original datetime index (must be a pandas DatetimeIndex if date_input is not an int). required
date_input Union[int, str, pd.Timestamp] Datetime to transform to integer. - If int, returns the same integer. - If str or pandas Timestamp, it is converted and expanded into the index. required
method str Can be ‘prediction’ or ‘validation’. - If ‘prediction’, the date must be later than the last date in the index. - If ‘validation’, the date must be within the index range. 'prediction'
date_literal str Variable name used in error messages. Defaults to ‘steps’. 'steps'
kwargs_pd_to_datetime Optional[dict] Additional keyword arguments to pass to pd.to_datetime(). Defaults to None. None

Returns

Name Type Description
int int date_input transformed to integer position in the index. - If date_input is an integer, it returns the same integer. - If method is ‘prediction’, number of steps to predict from the last date in the index. - If method is ‘validation’, position plus one of the date in the index.

Raises

Name Type Description
ValueError If method is not ‘prediction’ or ‘validation’.
TypeError If index is not a DatetimeIndex when date_input is not an integer.
ValueError If date_input (as date) does not meet the method’s constraints.
TypeError If date_input is not an integer, string, or pandas Timestamp.

expand_index

preprocessing.data_transform.expand_index(index, steps)

Create a new index extending from the end of the original index.

This function generates future indices for forecasting by extending the time series index by a specified number of steps. Handles both DatetimeIndex and RangeIndex appropriately.

Parameters

Name Type Description Default
index Union[pd.Index, None] Original pandas Index (DatetimeIndex or RangeIndex). If None, creates a RangeIndex starting from 0. required
steps int Number of future steps to generate. required

Returns

Name Type Description
pd.Index New pandas Index with steps future periods.

Raises

Name Type Description
TypeError If steps is not an integer, or if index is neither DatetimeIndex nor RangeIndex.

Examples

import pandas as pd
from spotforecast2_safe.preprocessing.data_transform import expand_index

# DatetimeIndex
dates = pd.date_range("2023-01-01", periods=5, freq="D")
new_index = expand_index(dates, 3)
print(new_index)
assert len(new_index) == 3
assert str(new_index[0].date()) == "2023-01-06"

# RangeIndex
range_idx = pd.RangeIndex(start=0, stop=10)
new_index = expand_index(range_idx, 5)
print(new_index)
assert new_index.equals(pd.RangeIndex(start=10, stop=15, step=1))

# None index (creates new RangeIndex)
new_index = expand_index(None, 3)
print(new_index)
assert new_index.equals(pd.RangeIndex(start=0, stop=3, step=1))

# Invalid: steps not an integer raises TypeError
try:
    expand_index(dates, 3.5)
except TypeError as e:
    print("Error: steps must be an integer")
DatetimeIndex(['2023-01-06', '2023-01-07', '2023-01-08'], dtype='datetime64[us]', freq='D')
RangeIndex(start=10, stop=15, step=1)
RangeIndex(start=0, stop=3, step=1)
Error: steps must be an integer

input_to_frame

preprocessing.data_transform.input_to_frame(data, input_name)

Convert input data to a pandas DataFrame.

This function ensures consistent DataFrame format for internal processing. If data is already a DataFrame, it’s returned as-is. If it’s a Series, it’s converted to a single-column DataFrame.

Parameters

Name Type Description Default
data Union[pd.Series, pd.DataFrame] Input data as pandas Series or DataFrame. required
input_name str Name of the input data type. Accepted values are: - ‘y’: Target time series - ‘last_window’: Last window for prediction - ‘exog’: Exogenous variables required

Returns

Name Type Description
pd.DataFrame DataFrame version of the input data. For Series input, uses the series
pd.DataFrame name if available, otherwise uses a default name based on input_name.

Examples

import pandas as pd
from spotforecast2_safe.preprocessing.data_transform import input_to_frame

# Series with name
y = pd.Series([1, 2, 3], name="sales")
df = input_to_frame(y, input_name="y")
print(df.columns.tolist())
assert df.columns.tolist() == ["sales"]

# Series without name (uses default)
y_no_name = pd.Series([1, 2, 3])
df = input_to_frame(y_no_name, input_name="y")
print(df.columns.tolist())
assert df.columns.tolist() == ["y"]

# DataFrame (returned as-is)
df_input = pd.DataFrame({"temp": [20, 21], "humidity": [50, 55]})
df_output = input_to_frame(df_input, input_name="exog")
print(df_output.columns.tolist())
assert df_output.columns.tolist() == ["temp", "humidity"]

# Exog series without name
exog = pd.Series([10, 20, 30])
df_exog = input_to_frame(exog, input_name="exog")
print(df_exog.columns.tolist())
assert df_exog.columns.tolist() == ["exog"]
['sales']
['y']
['temp', 'humidity']
['exog']

transform_dataframe

preprocessing.data_transform.transform_dataframe(
    df,
    transformer,
    fit=False,
    inverse_transform=False,
)

Transform raw values of pandas DataFrame with a scikit-learn alike transformer, preprocessor or ColumnTransformer.

The transformer used must have the following methods: fit, transform, fit_transform and inverse_transform. ColumnTransformers are not allowed since they do not have inverse_transform method.

Parameters

Name Type Description Default
df pd.DataFrame DataFrame to be transformed. required
transformer object Scikit-learn alike transformer, preprocessor, or ColumnTransformer. Must implement fit, transform, fit_transform and inverse_transform. required
fit bool Train the transformer before applying it. Defaults to False. False
inverse_transform bool Transform back the data to the original representation. This is not available when using transformers of class scikit-learn ColumnTransformers. Defaults to False. False

Returns

Name Type Description
pd.DataFrame Transformed DataFrame.

Raises

Name Type Description
TypeError If df is not a pandas DataFrame.
ValueError If inverse_transform is requested for ColumnTransformer.