manager.features.merge_data_and_covariates

manager.features.merge_data_and_covariates(
    data,
    exogenous_features,
    target_columns,
    exog_features,
    start,
    end,
    cov_end,
    forecast_horizon,
    cast_dtype='float32',
)

Merge target data with exogenous features and split into train/predict slices.

Performs an inner join of the selected target_columns from data with the selected exog_features from exogenous_features over the training window [start, end]. A separate prediction covariate slice (end+1h, cov_end] is also returned for use during inference.

String timestamps are converted to UTC-aware :class:~pandas.Timestamp objects automatically.

Parameters

Name Type Description Default
data pd.DataFrame DataFrame containing one or more target time series with a tz-aware :class:~pandas.DatetimeIndex. required
exogenous_features pd.DataFrame DataFrame with all exogenous feature columns, covering at least the window [start, cov_end]. required
target_columns List[str] Column names of the target variables to keep from data. required
exog_features List[str] Column names of the exogenous features to include in the merged output and the prediction slice. required
start Union[str, pd.Timestamp] Inclusive start of the training window. String values are parsed with utc=True. required
end Union[str, pd.Timestamp] Inclusive end of the training window. String values are parsed with utc=True. required
cov_end Union[str, pd.Timestamp] Inclusive end of the covariate (forecast) window. String values are parsed with utc=True. required
forecast_horizon int Number of forecast steps ahead (informational; used by calling code to validate slice length). required
cast_dtype Optional[str] NumPy dtype string applied to the merged training DataFrame via :meth:~pandas.DataFrame.astype. Pass None to skip casting. Defaults to "float32". 'float32'

Returns

Name Type Description
pd.DataFrame Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: A three-tuple
pd.DataFrame (data_with_exog, exo_tmp, exo_pred) where:
pd.DataFrame - data_with_exog — training-window DataFrame with target and exogenous columns merged (inner join on index).
Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame] - exo_tmp — full exogenous slice over [start, end] (all columns, not just exog_features).
Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame] - exo_pred — forecast-window exogenous slice over (end+1h, cov_end] (all columns).

Examples

Merge a toy target series with calendar features over a 3-day window:

import numpy as np
import pandas as pd
from spotforecast2_safe.manager.features import merge_data_and_covariates

idx = pd.date_range("2024-01-01", periods=120, freq="h", tz="UTC")
data = pd.DataFrame({"load": np.random.default_rng(42).normal(100, 10, 120)}, index=idx)
exog = pd.DataFrame(
    {"hour_sin": np.sin(2 * np.pi * idx.hour / 24),
     "hour_cos": np.cos(2 * np.pi * idx.hour / 24)},
    index=idx,
)

start = pd.Timestamp("2024-01-01 00:00", tz="UTC")
end   = pd.Timestamp("2024-01-04 23:00", tz="UTC")  # 96 h training
cov_end = pd.Timestamp("2024-01-05 23:00", tz="UTC")  # 24 h forecast

merged, exo_train, exo_pred = merge_data_and_covariates(
    data=data,
    exogenous_features=exog,
    target_columns=["load"],
    exog_features=["hour_sin", "hour_cos"],
    start=start,
    end=end,
    cov_end=cov_end,
    forecast_horizon=24,
)
print("merged shape:   ", merged.shape)
print("exo_train shape:", exo_train.shape)
print("exo_pred shape: ", exo_pred.shape)
merged shape:    (96, 3)
exo_train shape: (96, 2)
exo_pred shape:  (24, 2)