manager.features.get_target_data

manager.features.get_target_data(
    target,
    df_pipeline,
    config,
    *,
    data_with_exog=None,
    exog_feature_names=None,
    exo_pred=None,
    start_train_ts,
    end_train_ts,
)

Extract the training series and exogenous slices for one target column.

Clips the target column of df_pipeline to the training window defined by start_train_ts and end_train_ts. When exogenous features are enabled (config.use_exogenous_features is True) and data_with_exog is provided, the matching exogenous training slice and forecast-horizon slice are also returned; otherwise both are None.

This function is the canonical way to extract per-target data from the shared pipeline state so that outlier removal, imputation, and feature engineering are applied consistently across all forecasting tasks.

The training-window timestamps are supplied as explicit parameters so that this helper stays decoupled from RunState (ADR adr-multitask-configmulti-merge, step 5). Both parameters are required; passing None raises ValueError.

Parameters

Name Type Description Default
target str Name of the target column to extract from df_pipeline. required
df_pipeline pd.DataFrame DataFrame with a tz-aware DatetimeIndex containing all target columns produced by the preprocessing pipeline. required
config 'ConfigMulti' Pipeline configuration object. use_exogenous_features must be set. required
data_with_exog Optional[pd.DataFrame] Merged DataFrame of target and exogenous columns covering at least the training window. Required when config.use_exogenous_features is True. Pass None (default) to skip exogenous slicing. None
exog_feature_names Optional[List[str]] Column names to select from data_with_exog and exo_pred. Required when data_with_exog is not None. Pass None (default) when exogenous features are disabled. None
exo_pred Optional[pd.DataFrame] Exogenous feature DataFrame covering the forecast horizon. Required when data_with_exog is not None. Pass None (default) when exogenous features are disabled. None
start_train_ts pd.Timestamp Inclusive start of the training window (tz-aware pd.Timestamp). Keyword-only, required — pass task.run_state.start_train_ts after the pipeline has been prepared. Passing None raises ValueError. required
end_train_ts pd.Timestamp Inclusive end of the training window (tz-aware pd.Timestamp). Keyword-only, required — pass task.run_state.end_train_ts after the pipeline has been prepared. Passing None raises ValueError. required

Returns

Name Type Description
pd.Series Tuple[pd.Series, Optional[pd.DataFrame], Optional[pd.DataFrame]]:
Optional[pd.DataFrame] A three-tuple (y_train, exog_train, exog_future) where:
Optional[pd.DataFrame] - y_train — 1-D Series with the target values over the training window, squeezed to a plain Series.
Tuple[pd.Series, Optional[pd.DataFrame], Optional[pd.DataFrame]] - exog_train — DataFrame of selected exogenous features over the training window, cast to float32. None when exogenous features are disabled or data_with_exog is None.
Tuple[pd.Series, Optional[pd.DataFrame], Optional[pd.DataFrame]] - exog_future — DataFrame of selected exogenous features covering the forecast horizon, cast to float32. None when exogenous features are disabled or exo_pred is None.

Examples

Extract training data for a single target without exogenous features:

import pandas as pd
import numpy as np
from spotforecast2_safe.manager.features import get_target_data
from spotforecast2_safe.configurator.config_multi import ConfigMulti

idx = pd.date_range("2024-01-01", periods=168, freq="h", tz="UTC")
df_pipeline = pd.DataFrame({"load": np.random.default_rng(0).normal(100, 10, 168)}, index=idx)

config = ConfigMulti(
    targets=["load"],
    use_exogenous_features=False,
)
start_ts = pd.Timestamp("2024-01-01 00:00", tz="UTC")
end_ts   = pd.Timestamp("2024-01-07 23:00", tz="UTC")

y_train, exog_train, exog_future = get_target_data(
    target="load",
    df_pipeline=df_pipeline,
    config=config,
    start_train_ts=start_ts,
    end_train_ts=end_ts,
)
print(f"y_train length: {len(y_train)}")
print(f"exog_train:     {exog_train}")
print(f"exog_future:    {exog_future}")
y_train length: 168
exog_train:     None
exog_future:    None

Extract training data with exogenous features enabled:

import pandas as pd
import numpy as np
from spotforecast2_safe.manager.features import get_target_data
from spotforecast2_safe.configurator.config_multi import ConfigMulti

rng = np.random.default_rng(1)
idx_train = pd.date_range("2024-01-01", periods=168, freq="h", tz="UTC")
idx_future = pd.date_range("2024-01-08", periods=24, freq="h", tz="UTC")

df_pipeline = pd.DataFrame({"load": rng.normal(100, 10, 168)}, index=idx_train)

data_with_exog = pd.DataFrame(
    {
        "load": df_pipeline["load"],
        "hour_sin": np.sin(2 * np.pi * idx_train.hour / 24),
        "hour_cos": np.cos(2 * np.pi * idx_train.hour / 24),
    },
    index=idx_train,
)
exo_pred = pd.DataFrame(
    {
        "hour_sin": np.sin(2 * np.pi * idx_future.hour / 24),
        "hour_cos": np.cos(2 * np.pi * idx_future.hour / 24),
    },
    index=idx_future,
)

start_ts = pd.Timestamp("2024-01-01 00:00", tz="UTC")
end_ts   = pd.Timestamp("2024-01-07 23:00", tz="UTC")
config = ConfigMulti(targets=["load"], use_exogenous_features=True)

y_train, exog_train, exog_future = get_target_data(
    target="load",
    df_pipeline=df_pipeline,
    config=config,
    data_with_exog=data_with_exog,
    exog_feature_names=["hour_sin", "hour_cos"],
    exo_pred=exo_pred,
    start_train_ts=start_ts,
    end_train_ts=end_ts,
)
print(f"y_train length:     {len(y_train)}")
print(f"exog_train shape:   {exog_train.shape}")
print(f"exog_future shape:  {exog_future.shape}")
print(f"exog_train dtype:   {exog_train.dtypes.iloc[0]}")
y_train length:     168
exog_train shape:   (168, 2)
exog_future shape:  (24, 2)
exog_train dtype:   float32