data.data

data.data

Data structures for input and processed data.

Classes

Name Description
Data Container for input time series data.
Period Class abstraction for the information required to encode a period.

Data

data.data.Data(data)

Container for input time series data.

Attributes

Name Type Description
data pd.DataFrame pandas DataFrame containing the input time series data.

Methods

Name Description
from_csv Load data from a CSV file.
from_dataframe Create a new Data instance from an existing DataFrame.
from_csv
data.data.Data.from_csv(
    csv_path,
    timezone,
    columns=None,
    parse_dates=True,
    index_col=0,
    **kwargs,
)

Load data from a CSV file.

The CSV must contain a datetime column that becomes the DataFrame index. The index is localized to the provided timezone if it is naive, and then converted to UTC.

Parameters
Name Type Description Default
csv_path Path Path to the CSV file. required
timezone Optional[str] Timezone to assign if the index has no timezone. Must be provided if the index is naive. required
columns Optional[List[str]] List of column names to include. If provided, only these columns will be loaded from the CSV (optimizes reading speed). If None, all columns are loaded. None
parse_dates bool or list Passed to pd.read_csv. Defaults to True. True
index_col int or str Column to use as index. Defaults to 0. 0
**kwargs Any Additional keyword arguments forwarded to pd.read_csv. {}
Returns
Name Type Description
Data Data Instance containing the loaded DataFrame.
Raises
Name Type Description
ValueError If the CSV does not yield a DatetimeIndex.
ValueError If the index is timezone-naive and no timezone is provided.
Examples
>>> from spotforecast2_safe.data import Data
>>> data = Data.from_csv(
...     Path("data.csv"),
...     timezone="UTC",
...     columns=["target_col"]
... )
from_dataframe
data.data.Data.from_dataframe(df, timezone, columns=None)

Create a new Data instance from an existing DataFrame.

The DataFrame must have a datetime index. The index is localized to the provided timezone if it is naive, and then converted to UTC.

Parameters
Name Type Description Default
df pd.DataFrame Input DataFrame containing data. required
timezone Optional[str] Timezone to assign if the index is naive. Must be provided if the index has no timezone. required
columns Optional[List[str]] List of column names to include. If provided, only these columns will be selected from the DataFrame. If None, all columns are used. None
Returns
Name Type Description
Data Data Instance containing the provided DataFrame.
Raises
Name Type Description
ValueError If the DataFrame index is not a DatetimeIndex.
ValueError If the index is timezone-naive and no timezone is provided.

Period

data.data.Period(name, n_periods, column, input_range)

Class abstraction for the information required to encode a period.

Attributes

Name Type Description
name str Name of the period (e.g., ‘hour’, ‘day’).
n_periods int Number of periods to encode (e.g., 24 for hours).
column str Name of the column in the DataFrame containing the period information.
input_range Tuple[int, int] Tuple of (min, max) values for the period (e.g., (0, 23) for hours).

Examples

>>> from spotforecast2_safe.data import Period
>>> period = Period(name="hour", n_periods=24, column="hour", input_range=(0, 23))
>>> period.name
'hour'