utils.data_transform

utils.data_transform

Data transformation utilities for time series forecasting.

This module provides functions for normalizing and transforming data formats.

Functions

Name Description
expand_index Create a new index extending from the end of the original index.
input_to_frame Convert input data to a pandas DataFrame.

expand_index

utils.data_transform.expand_index(index, steps)

Create a new index extending from the end of the original index.

This function generates future indices for forecasting by extending the time series index by a specified number of steps. Handles both DatetimeIndex and RangeIndex appropriately.

Parameters

Name Type Description Default
index Union[pd.Index, None] Original pandas Index (DatetimeIndex or RangeIndex). If None, creates a RangeIndex starting from 0. required
steps int Number of future steps to generate. required

Returns

Name Type Description
pd.Index New pandas Index with steps future periods.

Raises

Name Type Description
TypeError If steps is not an integer, or if index is neither DatetimeIndex nor RangeIndex.

Examples

>>> import pandas as pd
>>> from spotforecast2.utils.data_transform import expand_index
>>>
>>> # DatetimeIndex
>>> dates = pd.date_range("2023-01-01", periods=5, freq="D")
>>> new_index = expand_index(dates, 3)
>>> new_index
DatetimeIndex(['2023-01-06', '2023-01-07', '2023-01-08'], dtype='datetime64[ns]', freq='D')
>>>
>>> # RangeIndex
>>> range_idx = pd.RangeIndex(start=0, stop=10)
>>> new_index = expand_index(range_idx, 5)
>>> new_index
RangeIndex(start=10, stop=15, step=1)
>>>
>>> # None index (creates new RangeIndex)
>>> new_index = expand_index(None, 3)
>>> new_index
RangeIndex(start=0, stop=3, step=1)
>>>
>>> # Invalid: steps not an integer
>>> try:
...     expand_index(dates, 3.5)
... except TypeError as e:
...     print("Error: steps must be an integer")
Error: steps must be an integer

input_to_frame

utils.data_transform.input_to_frame(data, input_name)

Convert input data to a pandas DataFrame.

This function ensures consistent DataFrame format for internal processing. If data is already a DataFrame, it’s returned as-is. If it’s a Series, it’s converted to a single-column DataFrame.

Parameters

Name Type Description Default
data Union[pd.Series, pd.DataFrame] Input data as pandas Series or DataFrame. required
input_name str Name of the input data type. Accepted values are: - ‘y’: Target time series - ‘last_window’: Last window for prediction - ‘exog’: Exogenous variables required

Returns

Name Type Description
pd.DataFrame DataFrame version of the input data. For Series input, uses the series
pd.DataFrame name if available, otherwise uses a default name based on input_name.

Examples

>>> import pandas as pd
>>> from spotforecast2.utils.data_transform import input_to_frame
>>>
>>> # Series with name
>>> y = pd.Series([1, 2, 3], name="sales")
>>> df = input_to_frame(y, input_name="y")
>>> df.columns.tolist()
['sales']
>>>
>>> # Series without name (uses default)
>>> y_no_name = pd.Series([1, 2, 3])
>>> df = input_to_frame(y_no_name, input_name="y")
>>> df.columns.tolist()
['y']
>>>
>>> # DataFrame (returned as-is)
>>> df_input = pd.DataFrame({"temp": [20, 21], "humidity": [50, 55]})
>>> df_output = input_to_frame(df_input, input_name="exog")
>>> df_output.columns.tolist()
['temp', 'humidity']
>>>
>>> # Exog series without name
>>> exog = pd.Series([10, 20, 30])
>>> df_exog = input_to_frame(exog, input_name="exog")
>>> df_exog.columns.tolist()
['exog']