manager.features.select_exogenous_features

manager.features.select_exogenous_features(
    exogenous_features,
    weather_aligned,
    cyclical_regex='_sin$|_cos$',
    include_weather_windows=False,
    include_holiday_features=False,
    include_holiday_adjacency_features=False,
    include_school_holiday_features=False,
    poly_features_degree=1,
)

Select and deduplicate exogenous feature columns for model training.

Builds a prioritised, deduplicated list of column names from exogenous_features suitable for passing as exog to a recursive forecaster. The selection order is:

  1. Cyclical sine/cosine columns (always included).
  2. Weather rolling-window columns (optional, include_weather_windows).
  3. Raw weather columns shared with weather_aligned.
  4. Holiday-related columns: is_holiday plus any column starting with "holiday" (optional, include_holiday_features).
  5. Holiday-adjacency columns: is_brueckentag, is_before_holiday, is_after_holiday (optional, include_holiday_adjacency_features).
  6. School-holiday column: is_school_holiday (optional, include_school_holiday_features).
  7. Polynomial interaction columns starting with "poly_" (included when poly_features_degree >= 2).

Duplicates are removed while preserving insertion order.

Parameters

Name Type Description Default
exogenous_features pd.DataFrame DataFrame containing the full set of candidate feature columns. required
weather_aligned pd.DataFrame DataFrame whose column names identify the raw ( non-window, non-polynomial) weather variables. required
cyclical_regex str Regular expression matched against column names to detect cyclical sine/cosine features. Defaults to "_sin$\|_cos$". '_sin$|_cos$'
include_weather_windows bool If True, include rolling-window weather columns (those containing "_window_" plus "_mean", "_min", or "_max"). Defaults to False. False
include_holiday_features bool If True, include the is_holiday column and any column whose name starts with "holiday". Defaults to False. False
include_holiday_adjacency_features bool If True, include the three adjacency columns is_brueckentag, is_before_holiday, and is_after_holiday when present in exogenous_features. Defaults to False. False
include_school_holiday_features bool If True, include the is_school_holiday column when present in exogenous_features. Defaults to False. False
poly_features_degree int Polynomial-interaction degree. Interaction columns (names starting with "poly_") are included only when this is >= 2; at 1 no interactions exist. Defaults to 1. 1

Returns

Name Type Description
List[str] List[str]: Deduplicated list of selected column names in priority
List[str] order.

Examples

Select cyclical and raw weather columns from a feature matrix:

import numpy as np
import pandas as pd
from spotforecast2_safe.manager.features import select_exogenous_features

rng = np.random.default_rng(1)
idx = pd.date_range("2024-01-01", periods=24, freq="h", tz="UTC")

weather = pd.DataFrame({"wind_speed": rng.uniform(0, 10, 24)}, index=idx)
exog = pd.DataFrame(
    {
        "hour_sin": np.sin(2 * np.pi * idx.hour / 24),
        "hour_cos": np.cos(2 * np.pi * idx.hour / 24),
        "wind_speed": weather["wind_speed"],
        "holiday_flag": 0,
    },
    index=idx,
)

selected = select_exogenous_features(
    exogenous_features=exog,
    weather_aligned=weather,
    include_holiday_features=False,
)
print("selected:", selected)
selected: ['hour_sin', 'hour_cos', 'wind_speed']