preprocessing.curate_data.agg_and_resample_data

preprocessing.curate_data.agg_and_resample_data(
    data,
    rule='h',
    closed='left',
    label='left',
    by='mean',
    verbose=False,
)

Aggregates and resamples the data to (e.g., hourly) frequency by computing the specified aggregation (e.g. for each hour).

Parameters

Name	Type	Description	Default
data	pd.DataFrame	The dataset with a datetime index.	required
rule	str	The resample rule (e.g., ‘h’ for hourly, ‘D’ for daily). Default is ‘h’ which creates an hourly grid.	`'h'`
closed	str	Which side of bin interval is closed. Default is ‘left’. Using `closed="left", label="left"` specifies that a time interval (e.g., 10:00 to 11:00) is labeled with the start timestamp (10:00). For consumption data, a different representation is usually more common: `closed="left", label="right"`, so the interval is labeled with the end timestamp (11:00), since consumption is typically reported after one hour.	`'left'`
label	str	Which bin edge label to use. Default is ‘left’. See ‘closed’ parameter for details on labeling behavior.	`'left'`
by	str or callable	Aggregation method to apply (e.g., ‘mean’, ‘sum’, ‘median’). Default is ‘mean’. The aggregation serves robustness: if the data were more finely resolved (e.g., quarter-hourly), asfreq would only pick one value (sampling), while .agg(“mean”) forms the correct average over the hour. If the data is already hourly, .agg doesn’t change anything but ensures that no duplicates exist.	`'mean'`
verbose	bool	Whether to print additional information.	`False`

Returns

Name	Type	Description
	pd.DataFrame	pd.DataFrame: Resampled and aggregated dataframe.

Notes

resample(rule=“h”): Creates an hourly grid
closed/label: Control how time intervals are labeled
.agg({…: by}): Aggregates values within each time bin

Examples

import pandas as pd
from spotforecast2_safe.preprocessing.curate_data import agg_and_resample_data

date_rng = pd.date_range(start="2023-01-01", end="2023-01-02", freq="15min")
data = pd.DataFrame({"value": range(len(date_rng))}, index=date_rng)
resampled_data = agg_and_resample_data(data, rule="h", by="mean")
print(resampled_data.head())
assert resampled_data.shape == (25, 1)

                     value
2023-01-01 00:00:00    1.5
2023-01-01 01:00:00    5.5
2023-01-01 02:00:00    9.5
2023-01-01 03:00:00   13.5
2023-01-01 04:00:00   17.5