utils.data_transform
utils.data_transform
Data transformation utilities for time series forecasting.
This module provides functions for normalizing and transforming data formats.
Functions
| Name | Description |
|---|---|
| expand_index | Create a new index extending from the end of the original index. |
| input_to_frame | Convert input data to a pandas DataFrame. |
expand_index
utils.data_transform.expand_index(index, steps)Create a new index extending from the end of the original index.
This function generates future indices for forecasting by extending the time series index by a specified number of steps. Handles both DatetimeIndex and RangeIndex appropriately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| index | Union[pd.Index, None] | Original pandas Index (DatetimeIndex or RangeIndex). If None, creates a RangeIndex starting from 0. | required |
| steps | int | Number of future steps to generate. | required |
Returns
| Name | Type | Description |
|---|---|---|
| pd.Index | New pandas Index with steps future periods. |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If steps is not an integer, or if index is neither DatetimeIndex nor RangeIndex. |
Examples
>>> import pandas as pd
>>> from spotforecast2.utils.data_transform import expand_index
>>>
>>> # DatetimeIndex
>>> dates = pd.date_range("2023-01-01", periods=5, freq="D")
>>> new_index = expand_index(dates, 3)
>>> new_index
DatetimeIndex(['2023-01-06', '2023-01-07', '2023-01-08'], dtype='datetime64[ns]', freq='D')
>>>
>>> # RangeIndex
>>> range_idx = pd.RangeIndex(start=0, stop=10)
>>> new_index = expand_index(range_idx, 5)
>>> new_index
RangeIndex(start=10, stop=15, step=1)
>>>
>>> # None index (creates new RangeIndex)
>>> new_index = expand_index(None, 3)
>>> new_index
RangeIndex(start=0, stop=3, step=1)
>>>
>>> # Invalid: steps not an integer
>>> try:
... expand_index(dates, 3.5)
... except TypeError as e:
... print("Error: steps must be an integer")
Error: steps must be an integerinput_to_frame
utils.data_transform.input_to_frame(data, input_name)Convert input data to a pandas DataFrame.
This function ensures consistent DataFrame format for internal processing. If data is already a DataFrame, it’s returned as-is. If it’s a Series, it’s converted to a single-column DataFrame.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| data | Union[pd.Series, pd.DataFrame] | Input data as pandas Series or DataFrame. | required |
| input_name | str | Name of the input data type. Accepted values are: - ‘y’: Target time series - ‘last_window’: Last window for prediction - ‘exog’: Exogenous variables | required |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | DataFrame version of the input data. For Series input, uses the series | |
| pd.DataFrame | name if available, otherwise uses a default name based on input_name. |
Examples
>>> import pandas as pd
>>> from spotforecast2.utils.data_transform import input_to_frame
>>>
>>> # Series with name
>>> y = pd.Series([1, 2, 3], name="sales")
>>> df = input_to_frame(y, input_name="y")
>>> df.columns.tolist()
['sales']
>>>
>>> # Series without name (uses default)
>>> y_no_name = pd.Series([1, 2, 3])
>>> df = input_to_frame(y_no_name, input_name="y")
>>> df.columns.tolist()
['y']
>>>
>>> # DataFrame (returned as-is)
>>> df_input = pd.DataFrame({"temp": [20, 21], "humidity": [50, 55]})
>>> df_output = input_to_frame(df_input, input_name="exog")
>>> df_output.columns.tolist()
['temp', 'humidity']
>>>
>>> # Exog series without name
>>> exog = pd.Series([10, 20, 30])
>>> df_exog = input_to_frame(exog, input_name="exog")
>>> df_exog.columns.tolist()
['exog']