utils.data_transform
utils.data_transform
Data transformation utilities for time series forecasting.
This module provides functions for normalizing and transforming data formats.
Functions
| Name | Description |
|---|---|
| date_to_index_position | Transform a datetime string or pandas Timestamp to an integer. The integer |
| expand_index | Create a new index extending from the end of the original index. |
| input_to_frame | Convert input data to a pandas DataFrame. |
| transform_dataframe | Transform raw values of pandas DataFrame with a scikit-learn alike |
date_to_index_position
utils.data_transform.date_to_index_position(
index,
date_input,
method='prediction',
date_literal='steps',
kwargs_pd_to_datetime=None,
)Transform a datetime string or pandas Timestamp to an integer. The integer represents the position of the datetime in the index.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| index | pd.Index | Original datetime index (must be a pandas DatetimeIndex if date_input is not an int). |
required |
| date_input | Union[int, str, pd.Timestamp] | Datetime to transform to integer. - If int, returns the same integer. - If str or pandas Timestamp, it is converted and expanded into the index. | required |
| method | str | Can be ‘prediction’ or ‘validation’. - If ‘prediction’, the date must be later than the last date in the index. - If ‘validation’, the date must be within the index range. | 'prediction' |
| date_literal | str | Variable name used in error messages. Defaults to ‘steps’. | 'steps' |
| kwargs_pd_to_datetime | Optional[dict] | Additional keyword arguments to pass to pd.to_datetime(). Defaults to None. |
None |
Returns
| Name | Type | Description |
|---|---|---|
| int | int | date_input transformed to integer position in the index. - If date_input is an integer, it returns the same integer. - If method is ‘prediction’, number of steps to predict from the last date in the index. - If method is ‘validation’, position plus one of the date in the index. |
Raises
| Name | Type | Description |
|---|---|---|
| ValueError | If method is not ‘prediction’ or ‘validation’. |
|
| TypeError | If index is not a DatetimeIndex when date_input is not an integer. |
|
| ValueError | If date_input (as date) does not meet the method’s constraints. |
|
| TypeError | If date_input is not an integer, string, or pandas Timestamp. |
expand_index
utils.data_transform.expand_index(index, steps)Create a new index extending from the end of the original index.
This function generates future indices for forecasting by extending the time series index by a specified number of steps. Handles both DatetimeIndex and RangeIndex appropriately.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| index | Union[pd.Index, None] | Original pandas Index (DatetimeIndex or RangeIndex). If None, creates a RangeIndex starting from 0. | required |
| steps | int | Number of future steps to generate. | required |
Returns
| Name | Type | Description |
|---|---|---|
| pd.Index | New pandas Index with steps future periods. |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If steps is not an integer, or if index is neither DatetimeIndex nor RangeIndex. |
Examples
>>> import pandas as pd
>>> from spotforecast2_safe.utils.data_transform import expand_index
>>>
>>> # DatetimeIndex
>>> dates = pd.date_range("2023-01-01", periods=5, freq="D")
>>> new_index = expand_index(dates, 3)
>>> new_index
DatetimeIndex(['2023-01-06', '2023-01-07', '2023-01-08'], dtype='datetime64[ns]', freq='D')
>>>
>>> # RangeIndex
>>> range_idx = pd.RangeIndex(start=0, stop=10)
>>> new_index = expand_index(range_idx, 5)
>>> new_index
RangeIndex(start=10, stop=15, step=1)
>>>
>>> # None index (creates new RangeIndex)
>>> new_index = expand_index(None, 3)
>>> new_index
RangeIndex(start=0, stop=3, step=1)
>>>
>>> # Invalid: steps not an integer
>>> try:
... expand_index(dates, 3.5)
... except TypeError as e:
... print("Error: steps must be an integer")
Error: steps must be an integerinput_to_frame
utils.data_transform.input_to_frame(data, input_name)Convert input data to a pandas DataFrame.
This function ensures consistent DataFrame format for internal processing. If data is already a DataFrame, it’s returned as-is. If it’s a Series, it’s converted to a single-column DataFrame.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| data | Union[pd.Series, pd.DataFrame] | Input data as pandas Series or DataFrame. | required |
| input_name | str | Name of the input data type. Accepted values are: - ‘y’: Target time series - ‘last_window’: Last window for prediction - ‘exog’: Exogenous variables | required |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | DataFrame version of the input data. For Series input, uses the series | |
| pd.DataFrame | name if available, otherwise uses a default name based on input_name. |
Examples
>>> import pandas as pd
>>> from spotforecast2_safe.utils.data_transform import input_to_frame
>>>
>>> # Series with name
>>> y = pd.Series([1, 2, 3], name="sales")
>>> df = input_to_frame(y, input_name="y")
>>> df.columns.tolist()
['sales']
>>>
>>> # Series without name (uses default)
>>> y_no_name = pd.Series([1, 2, 3])
>>> df = input_to_frame(y_no_name, input_name="y")
>>> df.columns.tolist()
['y']
>>>
>>> # DataFrame (returned as-is)
>>> df_input = pd.DataFrame({"temp": [20, 21], "humidity": [50, 55]})
>>> df_output = input_to_frame(df_input, input_name="exog")
>>> df_output.columns.tolist()
['temp', 'humidity']
>>>
>>> # Exog series without name
>>> exog = pd.Series([10, 20, 30])
>>> df_exog = input_to_frame(exog, input_name="exog")
>>> df_exog.columns.tolist()
['exog']transform_dataframe
utils.data_transform.transform_dataframe(
df,
transformer,
fit=False,
inverse_transform=False,
)Transform raw values of pandas DataFrame with a scikit-learn alike transformer, preprocessor or ColumnTransformer.
The transformer used must have the following methods: fit, transform, fit_transform and inverse_transform. ColumnTransformers are not allowed since they do not have inverse_transform method.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| df | pd.DataFrame | DataFrame to be transformed. | required |
| transformer | object | Scikit-learn alike transformer, preprocessor, or ColumnTransformer. Must implement fit, transform, fit_transform and inverse_transform. | required |
| fit | bool | Train the transformer before applying it. Defaults to False. | False |
| inverse_transform | bool | Transform back the data to the original representation. This is not available when using transformers of class scikit-learn ColumnTransformers. Defaults to False. | False |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | Transformed DataFrame. |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If df is not a pandas DataFrame. | |
| ValueError | If inverse_transform is requested for ColumnTransformer. |