utils.data_transform

utils.data_transform

Data transformation utilities for time series forecasting.

This module provides functions for normalizing and transforming data formats.

Functions

Name Description
date_to_index_position Transform a datetime string or pandas Timestamp to an integer. The integer
expand_index Create a new index extending from the end of the original index.
input_to_frame Convert input data to a pandas DataFrame.
transform_dataframe Transform raw values of pandas DataFrame with a scikit-learn alike

date_to_index_position

utils.data_transform.date_to_index_position(
    index,
    date_input,
    method='prediction',
    date_literal='steps',
    kwargs_pd_to_datetime=None,
)

Transform a datetime string or pandas Timestamp to an integer. The integer represents the position of the datetime in the index.

Parameters

Name Type Description Default
index pd.Index Original datetime index (must be a pandas DatetimeIndex if date_input is not an int). required
date_input Union[int, str, pd.Timestamp] Datetime to transform to integer. - If int, returns the same integer. - If str or pandas Timestamp, it is converted and expanded into the index. required
method str Can be ‘prediction’ or ‘validation’. - If ‘prediction’, the date must be later than the last date in the index. - If ‘validation’, the date must be within the index range. 'prediction'
date_literal str Variable name used in error messages. Defaults to ‘steps’. 'steps'
kwargs_pd_to_datetime Optional[dict] Additional keyword arguments to pass to pd.to_datetime(). Defaults to None. None

Returns

Name Type Description
int int date_input transformed to integer position in the index. - If date_input is an integer, it returns the same integer. - If method is ‘prediction’, number of steps to predict from the last date in the index. - If method is ‘validation’, position plus one of the date in the index.

Raises

Name Type Description
ValueError If method is not ‘prediction’ or ‘validation’.
TypeError If index is not a DatetimeIndex when date_input is not an integer.
ValueError If date_input (as date) does not meet the method’s constraints.
TypeError If date_input is not an integer, string, or pandas Timestamp.

expand_index

utils.data_transform.expand_index(index, steps)

Create a new index extending from the end of the original index.

This function generates future indices for forecasting by extending the time series index by a specified number of steps. Handles both DatetimeIndex and RangeIndex appropriately.

Parameters

Name Type Description Default
index Union[pd.Index, None] Original pandas Index (DatetimeIndex or RangeIndex). If None, creates a RangeIndex starting from 0. required
steps int Number of future steps to generate. required

Returns

Name Type Description
pd.Index New pandas Index with steps future periods.

Raises

Name Type Description
TypeError If steps is not an integer, or if index is neither DatetimeIndex nor RangeIndex.

Examples

>>> import pandas as pd
>>> from spotforecast2_safe.utils.data_transform import expand_index
>>>
>>> # DatetimeIndex
>>> dates = pd.date_range("2023-01-01", periods=5, freq="D")
>>> new_index = expand_index(dates, 3)
>>> new_index
DatetimeIndex(['2023-01-06', '2023-01-07', '2023-01-08'], dtype='datetime64[ns]', freq='D')
>>>
>>> # RangeIndex
>>> range_idx = pd.RangeIndex(start=0, stop=10)
>>> new_index = expand_index(range_idx, 5)
>>> new_index
RangeIndex(start=10, stop=15, step=1)
>>>
>>> # None index (creates new RangeIndex)
>>> new_index = expand_index(None, 3)
>>> new_index
RangeIndex(start=0, stop=3, step=1)
>>>
>>> # Invalid: steps not an integer
>>> try:
...     expand_index(dates, 3.5)
... except TypeError as e:
...     print("Error: steps must be an integer")
Error: steps must be an integer

input_to_frame

utils.data_transform.input_to_frame(data, input_name)

Convert input data to a pandas DataFrame.

This function ensures consistent DataFrame format for internal processing. If data is already a DataFrame, it’s returned as-is. If it’s a Series, it’s converted to a single-column DataFrame.

Parameters

Name Type Description Default
data Union[pd.Series, pd.DataFrame] Input data as pandas Series or DataFrame. required
input_name str Name of the input data type. Accepted values are: - ‘y’: Target time series - ‘last_window’: Last window for prediction - ‘exog’: Exogenous variables required

Returns

Name Type Description
pd.DataFrame DataFrame version of the input data. For Series input, uses the series
pd.DataFrame name if available, otherwise uses a default name based on input_name.

Examples

>>> import pandas as pd
>>> from spotforecast2_safe.utils.data_transform import input_to_frame
>>>
>>> # Series with name
>>> y = pd.Series([1, 2, 3], name="sales")
>>> df = input_to_frame(y, input_name="y")
>>> df.columns.tolist()
['sales']
>>>
>>> # Series without name (uses default)
>>> y_no_name = pd.Series([1, 2, 3])
>>> df = input_to_frame(y_no_name, input_name="y")
>>> df.columns.tolist()
['y']
>>>
>>> # DataFrame (returned as-is)
>>> df_input = pd.DataFrame({"temp": [20, 21], "humidity": [50, 55]})
>>> df_output = input_to_frame(df_input, input_name="exog")
>>> df_output.columns.tolist()
['temp', 'humidity']
>>>
>>> # Exog series without name
>>> exog = pd.Series([10, 20, 30])
>>> df_exog = input_to_frame(exog, input_name="exog")
>>> df_exog.columns.tolist()
['exog']

transform_dataframe

utils.data_transform.transform_dataframe(
    df,
    transformer,
    fit=False,
    inverse_transform=False,
)

Transform raw values of pandas DataFrame with a scikit-learn alike transformer, preprocessor or ColumnTransformer.

The transformer used must have the following methods: fit, transform, fit_transform and inverse_transform. ColumnTransformers are not allowed since they do not have inverse_transform method.

Parameters

Name Type Description Default
df pd.DataFrame DataFrame to be transformed. required
transformer object Scikit-learn alike transformer, preprocessor, or ColumnTransformer. Must implement fit, transform, fit_transform and inverse_transform. required
fit bool Train the transformer before applying it. Defaults to False. False
inverse_transform bool Transform back the data to the original representation. This is not available when using transformers of class scikit-learn ColumnTransformers. Defaults to False. False

Returns

Name Type Description
pd.DataFrame Transformed DataFrame.

Raises

Name Type Description
TypeError If df is not a pandas DataFrame.
ValueError If inverse_transform is requested for ColumnTransformer.