configurator.config_multi.ConfigMulti

configurator.config_multi.ConfigMulti(
    country_code='DE',
    periods=default_periods(),
    lags_consider=(lambda: list(range(1, 24)))(),
    train_size=(lambda: pd.Timedelta(days=(3 * 365)))(),
    end_train_default='2025-12-31 00:00+00:00',
    delta_val=(lambda: pd.Timedelta(hours=(24 * 7 * 10)))(),
    predict_size=24,
    cv_block_size=None,
    refit_size=7,
    random_state=314159,
    n_hyperparameters_trials=20,
    data_filename='interim/energy_load.csv',
    targets=None,
    use_outlier_detection=True,
    contamination=0.01,
    imputation_method='weighted',
    window_size=72,
    imputation_window_size=None,
    use_exogenous_features=True,
    latitude=51.5136,
    longitude=7.4653,
    timezone='UTC',
    state='NW',
    include_weather_windows=False,
    include_holiday_features=False,
    include_holiday_adjacency_features=False,
    use_population_weighted_weather=False,
    include_degree_hours=False,
    include_apparent_temperature=False,
    degree_hours_base_heating=15.0,
    degree_hours_base_cooling=22.0,
    include_ephemeris_features=False,
    include_day_type_features=False,
    include_school_holiday_features=False,
    poly_features_degree=1,
    max_poly_features=10,
    poly_mi_n_jobs=-1,
    poly_mi_sample_size=4000,
    include_covid_infection_rate=False,
    include_entsoe_forecast_load=False,
    include_entsoe_renewable_forecast=False,
    include_entsoe_net_load=False,
    include_entsoe_day_ahead_price=False,
    include_football_match_window=False,
    include_energy_saving_window=False,
    index_name='DateTime',
    bounds=None,
    verbose=False,
    cache_home=None,
    n_trials_optuna=15,
    n_trials_spotoptim=10,
    n_initial_spotoptim=5,
    max_time_spotoptim=None,
    warm_start_lags=(lambda: list(DEFAULT_WARM_START_LAGS))(),
    task='lazy',
    agg_weights=None,
    forecaster_factory=None,
    data_loader=None,
    test_data_loader=None,
    auto_save_models=True,
    data_frame_name='default',
    number_folds=10,
    on_weather_failure='raise',
    on_exog_provider_failure='raise',
    exog_max_gap_hours=0,
    exog_max_tail_gap_hours=0,
    exog_provider_window='full',
    target_qc_range_mw=None,
    target_qc_step_mw=None,
    target_qc_window_days=None,
    target_corruption_policy='abort',
    target_max_heal_hours=0,
    target_anchor_zone_hours=168,
    target_qc_deviation_mw=None,
    target_qc_deviation_ref=None,
    target_qc_deviation_slots=2,
)

Configuration for the multi-input forecasting pipeline.

This class manages all configuration parameters for the multi-input task, including training/prediction intervals, data sources, and feature engineering specifications. All parameters can be customized during initialization or used with sensible defaults.

country_code serves as the single ISO country code used for both API queries and holiday feature generation.

Parameters

Name Type Description Default
country_code str ISO 3166-1 alpha-2 country code (e.g. "DE"). Used for both API queries and holiday feature generation. 'DE'
periods Optional[List[Period]] List of Period objects defining cyclical feature encodings. default_periods()
lags_consider Optional[List[int]] List of lag values to consider for feature selection. (lambda: list(range(1, 24)))()
train_size Optional[pd.Timedelta] Time window for training data. (lambda: pd.Timedelta(days=(3 * 365)))()
end_train_default str Default end date for training period (ISO format with timezone). '2025-12-31 00:00+00:00'
delta_val Optional[pd.Timedelta] Validation window size. (lambda: pd.Timedelta(hours=(24 * 7 * 10)))()
predict_size int Number of hours to predict ahead. 24
cv_block_size int | None Cross-validation test-block width in hours. Defaults to None, meaning the CV uses predict_size. Set to a fixed value (e.g. 24) to decouple the cross-validation horizon from a render-dependent live predict_size. None
refit_size int Number of days between model refits. 7
random_state int Random seed for reproducibility. 314159
n_hyperparameters_trials int Number of trials for hyperparameter optimization. 20
data_filename str Path to the interim merged data file. 'interim/energy_load.csv'
targets Optional[List[str]] List of target column names to train models for. When None (default), no targets are pre-selected; set this attribute after loading the dataset (e.g. config.targets = df.columns.tolist()). Replaces standalone TARGETS and target_columns variables in pipeline scripts, providing a single source of truth for the active target set. None
use_outlier_detection bool If True, apply IsolationForest-based outlier removal. True
contamination float Proportion of outliers for IsolationForest (0 < contamination < 0.5). 0.01
imputation_method str Gap-filling strategy — "weighted" (n2n-style rolling weights) or "linear" (linear interpolation). 'weighted'
window_size int Rolling window size in hours for gap detection (weighted imputation). 72
use_exogenous_features bool If True, build weather/calendar/day-night/holiday features. True
latitude float Latitude of the target location in decimal degrees. 51.5136
longitude float Longitude of the target location in decimal degrees. 7.4653
timezone str IANA timezone string for the target location (e.g. "Europe/Berlin"). 'UTC'
state str ISO 3166-2 subdivision code for regional holidays (e.g. "NW"). 'NW'
include_weather_windows bool If True, include rolling weather-window features. False
include_holiday_features bool If True, include public-holiday indicator features. False
include_holiday_adjacency_features bool If True, include Brückentag and before/after-holiday indicators (is_brueckentag, is_before_holiday, is_after_holiday). Defaults to False. False
include_ephemeris_features bool If True, include solar-elevation and daylight-duration features. Defaults to False. False
include_day_type_features bool If True, include working-day and day-type class features (is_workday, day_type). Defaults to False. False
include_school_holiday_features bool Append the is_school_holiday binary indicator from the bundled OpenHolidays API dataset (ODbL-1.0). Coverage 2022-01-01 to 2027-12-31 for all 16 German Bundesländer. Only country_code="DE" is supported. Defaults to False. False
poly_features_degree int Polynomial-interaction degree. 1 (default) generates no interactions; 2 adds pairwise bilinear terms; 3+ higher order. 1
max_poly_features int Cap on polynomial interaction columns; only the top max_poly_features ranked by mutual information with the target are kept (<= 0 disables). Defaults to 10. 10
poly_mi_n_jobs Optional[int] Parallel jobs for the mutual-information ranking that enforces max_poly_features. -1 (default) uses all cores; None runs single-threaded. Parallelism does not change the selection. -1
poly_mi_sample_size Optional[int] Row cap for that ranking; longer series are scored on a reproducible random subsample of this size (seeded by random_state), which can change which borderline columns make the top K. None scores every row (the pre-15.8 behaviour). Defaults to 4000. 4000
index_name str Name assigned to the datetime column when the index is reset. Defaults to "DateTime". 'DateTime'
bounds Optional[List[tuple]] Per-column outlier bounds as a list of (lower, upper) tuples, one entry per target column. None until set. None
verbose bool If True, enable verbose output for pipeline steps. Defaults to False. False
cache_home Optional[Any] Path to the cache directory. None means the library default (~/spotforecast2_cache/) is used. None
n_trials_optuna int Number of Optuna Bayesian-search trials for hyperparameter optimization (task 3). Defaults to 15. 15
n_trials_spotoptim int Number of SpotOptim surrogate-search trials (task 4). Defaults to 10. 10
n_initial_spotoptim int Number of initial random evaluations for SpotOptim (task 4). Defaults to 5. 5
max_time_spotoptim Optional[float] Wall-clock budget for the SpotOptim search in minutes (task 4). The search stops when either n_trials_spotoptim evaluations or this time limit is reached, whichever comes first. None (the default) disables the limit. None
warm_start_lags Optional[List[int]] Lag set the SpotOptim task injects as a search-space candidate and uses to seed the optimizer’s first evaluation. Defaults to DEFAULT_WARM_START_LAGS ([1, 2, 3, 23, 24, 25, 47, 48, 167, 168, 169, 336]). None or an empty list disables the warm start. (lambda: list(DEFAULT_WARM_START_LAGS))()
task str Active prediction task — one of "lazy", "training", "optuna", or "spotoptim". Defaults to "lazy". 'lazy'
agg_weights Optional[List[float]] Per-target aggregation weights used when combining individual target forecasts into a single weighted sum. The list must contain one weight per entry in targets (in the same order). Positive values add the target’s contribution; negative values invert it. Slice the list to agg_weights[:len(targets)] when only a subset of targets is active. Defaults to None (no weights pre-defined; set after loading the dataset). None
auto_save_models bool Whether BaseTask._run_strategy should persist fitted forecasters to <cache_home>/models/ after every training run. Defaults to True so that saved models are immediately available for PredictTask without an explicit save_models() call. True
data_frame_name str Identifier for the active dataset. Used by BaseTask to name cache subdirectories, model files, and the per-dataset log file. Defaults to "default". 'default'
on_weather_failure Literal['raise', 'skip'] Policy for handling Open-Meteo fetch failures inside BaseTask.build_exogenous_features. "raise" (default) aborts the pipeline with a WeatherFetchError and preserves the safety-critical fail-safe semantics. "skip" logs a warning and continues with empty weather features so the rest of the pipeline can run without the Open-Meteo dependency. 'raise'
exog_max_gap_hours int Maximum length, in hours, of a contiguous run of missing exogenous-provider values healed before the provider is rejected. Interior gaps are time-interpolated; leading/trailing edge gaps are back-/forward-filled. 0 (default) keeps the strict fail-safe (any gap raises). Healed runs are logged with count and span. Only already-published day-ahead vintages are involved, so healing is leakage-clean (CR-3). 0
exog_max_tail_gap_hours int Extended healing budget, in hours, applied exclusively to the trailing-edge NaN run (the run containing the last index timestamp). The effective tail budget is max(exog_max_gap_hours, exog_max_tail_gap_hours). The canonical use case is the ENTSO-E day-ahead publication frontier: the last published vintage is zero-order-held forward to the forecast horizon without touching interior gaps (CR-3-clean). When exog_max_tail_gap_hours <= exog_max_gap_hours the parameter is inert (the interior budget already covers the tail) and a warning is logged. Defaults to 0. 0
exog_provider_window Literal['full', 'train'] Span the exogenous providers are validated against. "full" (default) requires coverage of the entire data_startcov_end request, matching prior behaviour. "train" validates only the consumed window [start_train_ts, cov_end], tolerating missing values before the training window. Honoured by the MultiTask pipeline; the forecaster-wrapper path currently always validates the full span. 'full'

Attributes

Name Type Description
country_code str ISO country code for API queries and holiday generation.
periods List[Period] Cyclical feature encoding specifications.
lags_consider List[int] Lag values for autoregressive features.
train_size pd.Timedelta Training data window.
end_train_default str Default training end date.
delta_val pd.Timedelta Validation window.
predict_size int Prediction horizon in hours.
refit_size int Refit interval in days.
random_state int Random seed.
n_hyperparameters_trials int Hyperparameter tuning trials.
targets Optional[List[str]] Active target column names. None until explicitly set from the loaded dataset.
use_outlier_detection bool IsolationForest outlier removal toggle.
contamination float IsolationForest contamination fraction.
imputation_method str Gap-filling strategy ("weighted" or "linear").
window_size int Rolling window size for weighted imputation.
use_exogenous_features bool Exogenous feature construction toggle.
latitude float Location latitude.
longitude float Location longitude.
timezone str IANA timezone string.
state str Subdivision code for regional holidays.
include_weather_windows bool Weather-window feature toggle.
include_holiday_features bool Holiday feature toggle.
include_holiday_adjacency_features bool Brückentag and before/after-holiday indicator toggle. Defaults to False.
include_ephemeris_features bool Solar-elevation and daylight-duration feature toggle. Defaults to False.
include_day_type_features bool Working-day / day-type class feature toggle. Defaults to False.
include_school_holiday_features bool Per-Bundesland school-holiday indicator toggle. Defaults to False.
poly_features_degree int Polynomial-interaction degree (1 = off).
max_poly_features int Cap on kept poly_* columns (top-K by MI).
poly_mi_n_jobs Optional[int] Parallel jobs for the MI ranking (-1 = all cores; selection-invariant).
poly_mi_sample_size Optional[int] Row cap for the MI ranking (None = score every row).
include_covid_infection_rate bool Append the bundled RKI German national COVID-19 7-day incidence as an exogenous regressor.
include_entsoe_forecast_load bool Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior.
include_entsoe_renewable_forecast bool Append the ENTSO-E day-ahead wind/solar generation forecast.
include_entsoe_net_load bool Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast).
include_entsoe_day_ahead_price bool Append the ENTSO-E day-ahead spot price (DE/LU).
include_football_match_window bool Append the bundled German football-match event-window feature (1.0 during configured match windows, 0.0 otherwise). Covers German national-team matches and tournament finals from UEFA Euro 2016 through FIFA World Cup 2026.
include_energy_saving_window bool Append the bundled German energy-saving regulatory window feature (1.0 during the EnSikuMaV and EU Regulation 2022/1854 periods, 0.0 otherwise).
index_name str Datetime column name used when resetting the index.
bounds Optional[List[tuple]] Per-column outlier bounds (lower, upper).
verbose bool Verbose output toggle.
cache_home Optional[Any] Path to the cache directory.
n_trials_optuna int Number of Optuna hyperparameter-search trials.
n_trials_spotoptim int Number of SpotOptim search trials.
n_initial_spotoptim int Number of initial SpotOptim evaluations.
max_time_spotoptim Optional[float] Wall-clock budget for the SpotOptim search in minutes; None disables the limit.
warm_start_lags Optional[List[int]] Seed lag set for the SpotOptim search; None or empty disables the warm start.
task str Active prediction task ("lazy", "training", "optuna", or "spotoptim").
agg_weights Optional[List[float]] Per-target aggregation weights. One weight per entry in targets; positive values add, negative values invert the target’s contribution. None until set.
auto_save_models bool Whether to auto-persist fitted forecasters after each training run.
data_frame_name str Active-dataset identifier used for cache and log-file naming.
number_folds int Cross-validation fold count for tuning tasks.
on_weather_failure Literal['raise', 'skip'] Open-Meteo fetch-failure policy: "raise" aborts, "skip" continues without weather.
on_exog_provider_failure Literal['raise', 'skip'] Exog-provider failure policy in ExogBuilder.build: "raise" (default) propagates the ExogProviderError; "skip" logs and omits the failing provider’s columns.
exog_max_gap_hours int Maximum contiguous gap in hours that providers will heal before raising (0 = strict fail-safe).
exog_provider_window Literal['full', 'train'] Validation window for exog providers: "full" (default) or "train".

Notes

The default period configurations use specific n_periods to balance resolution and smoothing: - Daily: n_periods=12 (24h) provides ~2h resolution, smoothing hourly noise and halving dimensionality. - Weekly: n_periods typically matches range (1:1) to distinguish day-of-week patterns. - Yearly: n_periods=12 (365d) provides ~1 month resolution, capturing broad seasonal trends without overfitting.

See docs/PERIOD_CONFIGURATION_RATIONALE.md for a detailed analysis.

Examples

import pandas as pd
from spotforecast2_safe.configurator.config_multi import ConfigMulti
config = ConfigMulti()
print(f"country_code: {config.country_code}")
print(f"Predict size: {config.predict_size}")
print(f"Random state: {config.random_state}")
print(f"Targets (default): {config.targets}")
print(f"agg_weights (default): {config.agg_weights}")
print(f"index_name: {config.index_name}")
print(f"bounds: {config.bounds}")

# Set targets and bounds (user input that stays on the config)
config.targets = ["A", "B", "C"]
config.bounds = [(-2500, 4500), (-10, 3000)]
print(f"Targets (after setting): {config.targets}")
print(f"bounds: {config.bounds}")

# Create custom configuration — country_code serves both API and holiday purposes
custom_config = ConfigMulti(
    country_code='FR',
    predict_size=48,
    random_state=42,
    targets=["A", "B"],
    index_name="DateTime",
)
print(f"country_code: {custom_config.country_code}")
print(f"Predict size: {custom_config.predict_size}")
print(f"Random state: {custom_config.random_state}")
print(f"Targets: {custom_config.targets}")

# Verify training window
print(f"Training window: {config.train_size == pd.Timedelta(days=3 * 365)}")

# Check default periods
print(f"Number of periods: {len(config.periods)}")
print(f"First period name: {config.periods[0].name}")
country_code: DE
Predict size: 24
Random state: 314159
Targets (default): None
agg_weights (default): None
index_name: DateTime
bounds: None
Targets (after setting): ['A', 'B', 'C']
bounds: [(-2500, 4500), (-10, 3000)]
country_code: FR
Predict size: 48
Random state: 42
Targets: ['A', 'B']
Training window: True
Number of periods: 5
First period name: daily

Methods

Name Description
get_params Get parameters for this configuration object.
set_params Set the parameters of this configuration object.

get_params

configurator.config_multi.ConfigMulti.get_params(deep=True)

Get parameters for this configuration object.

Parameters

Name Type Description Default
deep bool If True, will return the parameters for this configuration and contained sub-objects that are estimators. True

Returns

Name Type Description
params Dict[str, object] Dictionary of parameter names mapped to their values.

Examples

from spotforecast2_safe.configurator.config_multi import ConfigMulti
config = ConfigMulti(country_code="FR")
p = config.get_params()
print(f"country_code: {p['country_code']}")
print(f"Predict size: {p['predict_size']}")
print(f"Random state: {p['random_state']}")
print(f"index_name: {p['index_name']}")
print(f"bounds: {p['bounds']}")
print(f"agg_weights: {p['agg_weights']}")
country_code: FR
Predict size: 24
Random state: 314159
index_name: DateTime
bounds: None
agg_weights: None

set_params

configurator.config_multi.ConfigMulti.set_params(params=None, **kwargs)

Set the parameters of this configuration object.

Parameters

Name Type Description Default
params Dict[str, object] Optional dictionary of parameter names mapped to their new values. None
**kwargs object Additional parameter names mapped to their new values. It supports configuring nested ‘Period’ objects using the periods__<name>__<param> notation. {}

Returns

Name Type Description
ConfigMulti ConfigMulti The configuration instance with updated parameters (supports method chaining).

Examples

from spotforecast2_safe.configurator.config_multi import ConfigMulti
config = ConfigMulti()
_ = config.set_params(country_code="FR", predict_size=48)
print(f"country_code: {config.country_code}")
print(f"Predict size: {config.predict_size}")
print(f"Random state: {config.random_state}")

# Deep parameter setting
_ = config.set_params(periods__daily__n_periods=24)
print(next(p.n_periods for p in config.periods if p.name == "daily"))
country_code: FR
Predict size: 48
Random state: 314159
24