configurator.config_multi.ConfigMulti

configurator.config_multi.ConfigMulti(
    country_code='DE',
    periods=default_periods(),
    lags_consider=(lambda: list(range(1, 24)))(),
    train_size=(lambda: pd.Timedelta(days=(3 * 365)))(),
    end_train_default='2025-12-31 00:00+00:00',
    delta_val=(lambda: pd.Timedelta(hours=(24 * 7 * 10)))(),
    predict_size=24,
    cv_block_size=None,
    refit_size=7,
    random_state=314159,
    n_hyperparameters_trials=20,
    data_filename='interim/energy_load.csv',
    targets=None,
    use_outlier_detection=True,
    contamination=0.01,
    imputation_method='weighted',
    window_size=72,
    imputation_window_size=None,
    use_exogenous_features=True,
    latitude=51.5136,
    longitude=7.4653,
    timezone='UTC',
    state='NW',
    include_weather_windows=False,
    include_holiday_features=False,
    include_holiday_adjacency_features=False,
    use_population_weighted_weather=False,
    per_zone_weather=False,
    zone_weather_locations=None,
    include_degree_hours=False,
    include_apparent_temperature=False,
    degree_hours_base_heating=15.0,
    degree_hours_base_cooling=22.0,
    include_ephemeris_features=False,
    include_day_type_features=False,
    include_school_holiday_features=False,
    poly_features_degree=1,
    max_poly_features=10,
    poly_mi_n_jobs=-1,
    poly_mi_sample_size=4000,
    include_covid_infection_rate=False,
    include_entsoe_forecast_load=False,
    include_entsoe_renewable_forecast=False,
    include_entsoe_net_load=False,
    include_entsoe_day_ahead_price=False,
    include_football_match_window=False,
    include_energy_saving_window=False,
    index_name='DateTime',
    bounds=None,
    verbose=False,
    cache_home=None,
    n_trials_optuna=15,
    n_trials_spotoptim=10,
    n_initial_spotoptim=5,
    max_time_spotoptim=None,
    warm_start_lags=(lambda: list(DEFAULT_WARM_START_LAGS))(),
    task='lazy',
    agg_weights=None,
    forecaster_factory=None,
    lgbm_n_jobs=1,
    data_loader=None,
    test_data_loader=None,
    auto_save_models=True,
    data_frame_name='default',
    number_folds=10,
    on_weather_failure='raise',
    on_exog_provider_failure='raise',
    exog_max_gap_hours=0,
    exog_max_tail_gap_hours=0,
    exog_provider_window='full',
    target_qc_range_mw=None,
    target_qc_step_mw=None,
    target_qc_window_days=None,
    target_corruption_policy='abort',
    target_max_heal_hours=0,
    target_anchor_zone_hours=168,
    target_qc_deviation_mw=None,
    target_qc_deviation_ref=None,
    target_qc_deviation_slots=2,
)

Configuration for the multi-input forecasting pipeline.

This class manages all configuration parameters for the multi-input task, including training/prediction intervals, data sources, and feature engineering specifications. All parameters can be customized during initialization or used with sensible defaults.

country_code serves as the single ISO country code used for both API queries and holiday feature generation.

Parameters

Name	Type	Description	Default
country_code	str	ISO 3166-1 alpha-2 country code (e.g. `"DE"`). Used for both API queries and holiday feature generation.	`'DE'`
periods	Optional[List[`Period`]]	List of Period objects defining cyclical feature encodings.	`default_periods()`
lags_consider	Optional[List[int]]	List of lag values to consider for feature selection.	`(lambda: list(range(1, 24)))()`
train_size	Optional[pd.Timedelta]	Time window for training data.	`(lambda: pd.Timedelta(days=(3 * 365)))()`
end_train_default	str	Default end date for training period (ISO format with timezone).	`'2025-12-31 00:00+00:00'`
delta_val	Optional[pd.Timedelta]	Validation window size.	`(lambda: pd.Timedelta(hours=(24 * 7 * 10)))()`
predict_size	int	Number of hours to predict ahead.	`24`
cv_block_size	int \| None	Cross-validation test-block width in hours. Defaults to `None`, meaning the CV uses `predict_size`. Set to a fixed value (e.g. `24`) to decouple the cross-validation horizon from a render-dependent live `predict_size`.	`None`
refit_size	int	Number of days between model refits.	`7`
random_state	int	Random seed for reproducibility.	`314159`
n_hyperparameters_trials	int	Number of trials for hyperparameter optimization.	`20`
data_filename	str	Path to the interim merged data file.	`'interim/energy_load.csv'`
targets	Optional[List[str]]	List of target column names to train models for. When `None` (default), no targets are pre-selected; set this attribute after loading the dataset (e.g. `config.targets = df.columns.tolist()`). Replaces standalone `TARGETS` and `target_columns` variables in pipeline scripts, providing a single source of truth for the active target set.	`None`
use_outlier_detection	bool	If True, apply IsolationForest-based outlier removal.	`True`
contamination	float	Proportion of outliers for IsolationForest (0 < contamination < 0.5).	`0.01`
imputation_method	str	Gap-filling strategy — `"weighted"` (n2n-style rolling weights) or `"linear"` (linear interpolation).	`'weighted'`
window_size	int	Rolling window size in hours for gap detection (weighted imputation).	`72`
use_exogenous_features	bool	If True, build weather/calendar/day-night/holiday features.	`True`
latitude	float	Latitude of the target location in decimal degrees.	`51.5136`
longitude	float	Longitude of the target location in decimal degrees.	`7.4653`
timezone	str	IANA timezone string for the target location (e.g. `"Europe/Berlin"`).	`'UTC'`
state	str	ISO 3166-2 subdivision code for regional holidays (e.g. `"NW"`).	`'NW'`
include_weather_windows	bool	If True, include rolling weather-window features.	`False`
include_holiday_features	bool	If True, include public-holiday indicator features.	`False`
include_holiday_adjacency_features	bool	If True, include Brückentag and before/after-holiday indicators (`is_brueckentag`, `is_before_holiday`, `is_after_holiday`). Defaults to `False`.	`False`
include_ephemeris_features	bool	If True, include solar-elevation and daylight-duration features. Defaults to `False`.	`False`
include_day_type_features	bool	If True, include working-day and day-type class features (`is_workday`, `day_type`). Defaults to `False`.	`False`
include_school_holiday_features	bool	Append the `is_school_holiday` binary indicator from the bundled OpenHolidays API dataset (ODbL-1.0). Coverage 2022-01-01 to 2027-12-31 for all 16 German Bundesländer. Only `country_code="DE"` is supported. Defaults to `False`.	`False`
per_zone_weather	bool	When True, each target is treated as a German TSO control zone and receives weather from its own regional cities via `weather.locations.locations_for_zone`. Mutually exclusive with `use_population_weighted_weather`; requires `use_exogenous_features=True`; not compatible with `poly_features_degree >= 2`. Default `False` → byte-identical to the shared-weather baseline.	`False`
zone_weather_locations	Optional[Dict[str, Any]]	Optional override mapping from zone key (e.g. `"load_50hertz"`) to a list of `WeatherLocation` objects. `None` (default) uses the built-in registry partition from `GERMAN_TSO_ZONE_CITIES`.	`None`
poly_features_degree	int	Polynomial-interaction degree. `1` (default) generates no interactions; `2` adds pairwise bilinear terms; `3+` higher order.	`1`
max_poly_features	int	Cap on polynomial interaction columns; only the top `max_poly_features` ranked by mutual information with the target are kept (`<= 0` disables). Defaults to `10`.	`10`
poly_mi_n_jobs	Optional[int]	Parallel jobs for the mutual-information ranking that enforces `max_poly_features`. `-1` (default) uses all cores; `None` runs single-threaded. Parallelism does not change the selection.	`-1`
lgbm_n_jobs	int	Thread count for the LightGBM estimators built by the lgbm forecaster factories (`LGBMRegressor(n_jobs=...)`). Defaults to `1` so the backtester parallelises CV folds across processes instead of relying on LightGBM’s in-model OpenMP, which anti-scales on heterogeneous-core CPUs (e.g. Apple Silicon). Set `-1` / a larger value on many-core homogeneous machines (e.g. Linux Xeon).	`1`
poly_mi_sample_size	Optional[int]	Row cap for that ranking; longer series are scored on a reproducible random subsample of this size (seeded by `random_state`), which can change which borderline columns make the top K. `None` scores every row (the pre-15.8 behaviour). Defaults to `4000`.	`4000`
index_name	str	Name assigned to the datetime column when the index is reset. Defaults to `"DateTime"`.	`'DateTime'`
bounds	Optional[List[tuple]]	Per-column outlier bounds as a list of `(lower, upper)` tuples, one entry per target column. `None` until set.	`None`
verbose	bool	If `True`, enable verbose output for pipeline steps. Defaults to `False`.	`False`
cache_home	Optional[Any]	Path to the cache directory. `None` means the library default (`~/spotforecast2_cache/`) is used.	`None`
n_trials_optuna	int	Number of Optuna Bayesian-search trials for hyperparameter optimization (task 3). Defaults to `15`.	`15`
n_trials_spotoptim	int	Number of SpotOptim surrogate-search trials (task 4). Defaults to `10`.	`10`
n_initial_spotoptim	int	Number of initial random evaluations for SpotOptim (task 4). Defaults to `5`.	`5`
max_time_spotoptim	Optional[float]	Wall-clock budget for the SpotOptim search in minutes (task 4). The search stops when either `n_trials_spotoptim` evaluations or this time limit is reached, whichever comes first. `None` (the default) disables the limit.	`None`
warm_start_lags	Optional[List[int]]	Lag set the SpotOptim task injects as a search-space candidate and uses to seed the optimizer’s first evaluation. Defaults to `DEFAULT_WARM_START_LAGS` (`[1, 2, 3, 23, 24, 25, 47, 48, 167, 168, 169, 336]`). `None` or an empty list disables the warm start.	`(lambda: list(DEFAULT_WARM_START_LAGS))()`
task	str	Active prediction task — one of `"lazy"`, `"training"`, `"optuna"`, or `"spotoptim"`. Defaults to `"lazy"`.	`'lazy'`
agg_weights	Optional[List[float]]	Per-target aggregation weights used when combining individual target forecasts into a single weighted sum. The list must contain one weight per entry in `targets` (in the same order). Positive values add the target’s contribution; negative values invert it. Slice the list to `agg_weights[:len(targets)]` when only a subset of targets is active. Defaults to `None` (no weights pre-defined; set after loading the dataset).	`None`
auto_save_models	bool	Whether `BaseTask._run_strategy` should persist fitted forecasters to `<cache_home>/models/` after every training run. Defaults to `True` so that saved models are immediately available for `PredictTask` without an explicit `save_models()` call.	`True`
data_frame_name	str	Identifier for the active dataset. Used by `BaseTask` to name cache subdirectories, model files, and the per-dataset log file. Defaults to `"default"`.	`'default'`
on_weather_failure	Literal['raise', 'skip']	Policy for handling Open-Meteo fetch failures inside `BaseTask.build_exogenous_features`. `"raise"` (default) aborts the pipeline with a `WeatherFetchError` and preserves the safety-critical fail-safe semantics. `"skip"` logs a warning and continues with empty weather features so the rest of the pipeline can run without the Open-Meteo dependency.	`'raise'`
exog_max_gap_hours	int	Maximum length, in hours, of a contiguous run of missing exogenous-provider values healed before the provider is rejected. Interior gaps are time-interpolated; leading/trailing edge gaps are back-/forward-filled. `0` (default) keeps the strict fail-safe (any gap raises). Healed runs are logged with count and span. Only already-published day-ahead vintages are involved, so healing is leakage-clean (CR-3).	`0`
exog_max_tail_gap_hours	int	Extended healing budget, in hours, applied exclusively to the trailing-edge NaN run (the run containing the last index timestamp). The effective tail budget is `max(exog_max_gap_hours, exog_max_tail_gap_hours)`. The canonical use case is the ENTSO-E day-ahead publication frontier: the last published vintage is zero-order-held forward to the forecast horizon without touching interior gaps (CR-3-clean). When `exog_max_tail_gap_hours <= exog_max_gap_hours` the parameter is inert (the interior budget already covers the tail) and a warning is logged. Defaults to `0`.	`0`
exog_provider_window	Literal['full', 'train']	Span the exogenous providers are validated against. `"full"` (default) requires coverage of the entire `data_start`→`cov_end` request, matching prior behaviour. `"train"` validates only the consumed window `[start_train_ts, cov_end]`, tolerating missing values before the training window. Honoured by the MultiTask pipeline; the forecaster-wrapper path currently always validates the full span.	`'full'`

Attributes

Name	Type	Description
country_code	str	ISO country code for API queries and holiday generation.
periods	List[`Period`]	Cyclical feature encoding specifications.
lags_consider	List[int]	Lag values for autoregressive features.
train_size	pd.Timedelta	Training data window.
end_train_default	str	Default training end date.
delta_val	pd.Timedelta	Validation window.
predict_size	int	Prediction horizon in hours.
refit_size	int	Refit interval in days.
random_state	int	Random seed.
n_hyperparameters_trials	int	Hyperparameter tuning trials.
targets	Optional[List[str]]	Active target column names. `None` until explicitly set from the loaded dataset.
use_outlier_detection	bool	IsolationForest outlier removal toggle.
contamination	float	IsolationForest contamination fraction.
imputation_method	str	Gap-filling strategy (`"weighted"` or `"linear"`).
window_size	int	Rolling window size for weighted imputation.
use_exogenous_features	bool	Exogenous feature construction toggle.
latitude	float	Location latitude.
longitude	float	Location longitude.
timezone	str	IANA timezone string.
state	str	Subdivision code for regional holidays.
include_weather_windows	bool	Weather-window feature toggle.
include_holiday_features	bool	Holiday feature toggle.
include_holiday_adjacency_features	bool	Brückentag and before/after-holiday indicator toggle. Defaults to `False`.
include_ephemeris_features	bool	Solar-elevation and daylight-duration feature toggle. Defaults to `False`.
include_day_type_features	bool	Working-day / day-type class feature toggle. Defaults to `False`.
include_school_holiday_features	bool	Per-Bundesland school-holiday indicator toggle. Defaults to `False`.
per_zone_weather	bool	When True, each target is a TSO control zone that fetches its own regional weather via `weather.locations.locations_for_zone`. Mutually exclusive with `use_population_weighted_weather`; requires `use_exogenous_features=True`; not compatible with `poly_features_degree >= 2`. Default `False`.
zone_weather_locations	Optional[Dict[str, Any]]	Override mapping from zone key to a list of `WeatherLocation` objects. `None` uses the built-in `GERMAN_TSO_ZONE_CITIES` partition.
poly_features_degree	int	Polynomial-interaction degree (1 = off).
max_poly_features	int	Cap on kept `poly_*` columns (top-K by MI).
poly_mi_n_jobs	Optional[int]	Parallel jobs for the MI ranking (`-1` = all cores; selection-invariant).
lgbm_n_jobs	int	LightGBM estimator thread count for the lgbm forecaster factories (default `1`; favours per-fold process parallelism over in-model OpenMP, which anti-scales on Apple Silicon). Raise on many-core homogeneous CPUs.
poly_mi_sample_size	Optional[int]	Row cap for the MI ranking (`None` = score every row).
include_covid_infection_rate	bool	Append the bundled RKI German national COVID-19 7-day incidence as an exogenous regressor.
include_entsoe_forecast_load	bool	Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior.
include_entsoe_renewable_forecast	bool	Append the ENTSO-E day-ahead wind/solar generation forecast.
include_entsoe_net_load	bool	Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast).
include_entsoe_day_ahead_price	bool	Append the ENTSO-E day-ahead spot price (DE/LU).
include_football_match_window	bool	Append the bundled German football-match event-window feature (1.0 during configured match windows, 0.0 otherwise). Covers German national-team matches and tournament finals from UEFA Euro 2016 through FIFA World Cup 2026.
include_energy_saving_window	bool	Append the bundled German energy-saving regulatory window feature (1.0 during the EnSikuMaV and EU Regulation 2022/1854 periods, 0.0 otherwise).
index_name	str	Datetime column name used when resetting the index.
bounds	Optional[List[tuple]]	Per-column outlier bounds `(lower, upper)`.
verbose	bool	Verbose output toggle.
cache_home	Optional[Any]	Path to the cache directory.
n_trials_optuna	int	Number of Optuna hyperparameter-search trials.
n_trials_spotoptim	int	Number of SpotOptim search trials.
n_initial_spotoptim	int	Number of initial SpotOptim evaluations.
max_time_spotoptim	Optional[float]	Wall-clock budget for the SpotOptim search in minutes; `None` disables the limit.
warm_start_lags	Optional[List[int]]	Seed lag set for the SpotOptim search; `None` or empty disables the warm start.
task	str	Active prediction task (`"lazy"`, `"training"`, `"optuna"`, or `"spotoptim"`).
agg_weights	Optional[List[float]]	Per-target aggregation weights. One weight per entry in `targets`; positive values add, negative values invert the target’s contribution. `None` until set.
auto_save_models	bool	Whether to auto-persist fitted forecasters after each training run.
data_frame_name	str	Active-dataset identifier used for cache and log-file naming.
number_folds	int	Cross-validation fold count for tuning tasks.
on_weather_failure	Literal['raise', 'skip']	Open-Meteo fetch-failure policy: `"raise"` aborts, `"skip"` continues without weather.
on_exog_provider_failure	Literal['raise', 'skip']	Exog-provider failure policy in `ExogBuilder.build`: `"raise"` (default) propagates the `ExogProviderError`; `"skip"` logs and omits the failing provider’s columns.
exog_max_gap_hours	int	Maximum contiguous gap in hours that providers will heal before raising (0 = strict fail-safe).
exog_provider_window	Literal['full', 'train']	Validation window for exog providers: `"full"` (default) or `"train"`.

Notes

The default period configurations use specific n_periods to balance resolution and smoothing: - Daily: n_periods=12 (24h) provides ~2h resolution, smoothing hourly noise and halving dimensionality. - Weekly: n_periods typically matches range (1:1) to distinguish day-of-week patterns. - Yearly: n_periods=12 (365d) provides ~1 month resolution, capturing broad seasonal trends without overfitting.

See docs/PERIOD_CONFIGURATION_RATIONALE.md for a detailed analysis.

Examples

import pandas as pd
from spotforecast2_safe.configurator.config_multi import ConfigMulti
config = ConfigMulti()
print(f"country_code: {config.country_code}")
print(f"Predict size: {config.predict_size}")
print(f"Random state: {config.random_state}")
print(f"Targets (default): {config.targets}")
print(f"agg_weights (default): {config.agg_weights}")
print(f"index_name: {config.index_name}")
print(f"bounds: {config.bounds}")

# Set targets and bounds (user input that stays on the config)
config.targets = ["A", "B", "C"]
config.bounds = [(-2500, 4500), (-10, 3000)]
print(f"Targets (after setting): {config.targets}")
print(f"bounds: {config.bounds}")

# Create custom configuration — country_code serves both API and holiday purposes
custom_config = ConfigMulti(
    country_code='FR',
    predict_size=48,
    random_state=42,
    targets=["A", "B"],
    index_name="DateTime",
)
print(f"country_code: {custom_config.country_code}")
print(f"Predict size: {custom_config.predict_size}")
print(f"Random state: {custom_config.random_state}")
print(f"Targets: {custom_config.targets}")

# Verify training window
print(f"Training window: {config.train_size == pd.Timedelta(days=3 * 365)}")

# Check default periods
print(f"Number of periods: {len(config.periods)}")
print(f"First period name: {config.periods[0].name}")

country_code: DE
Predict size: 24
Random state: 314159
Targets (default): None
agg_weights (default): None
index_name: DateTime
bounds: None
Targets (after setting): ['A', 'B', 'C']
bounds: [(-2500, 4500), (-10, 3000)]
country_code: FR
Predict size: 48
Random state: 42
Targets: ['A', 'B']
Training window: True
Number of periods: 5
First period name: daily

Methods

Name	Description
get_params	Get parameters for this configuration object.
set_params	Set the parameters of this configuration object.

get_params

configurator.config_multi.ConfigMulti.get_params(deep=True)

Get parameters for this configuration object.

Parameters

Name	Type	Description	Default
deep	bool	If True, will return the parameters for this configuration and contained sub-objects that are estimators.	`True`

Returns

Name	Type	Description
params	Dict[str, object]	Dictionary of parameter names mapped to their values.

Examples

from spotforecast2_safe.configurator.config_multi import ConfigMulti
config = ConfigMulti(country_code="FR")
p = config.get_params()
print(f"country_code: {p['country_code']}")
print(f"Predict size: {p['predict_size']}")
print(f"Random state: {p['random_state']}")
print(f"index_name: {p['index_name']}")
print(f"bounds: {p['bounds']}")
print(f"agg_weights: {p['agg_weights']}")

country_code: FR
Predict size: 24
Random state: 314159
index_name: DateTime
bounds: None
agg_weights: None

set_params

configurator.config_multi.ConfigMulti.set_params(params=None, **kwargs)

Set the parameters of this configuration object.

Parameters

Name	Type	Description	Default
params	Dict[str, object]	Optional dictionary of parameter names mapped to their new values.	`None`
**kwargs	object	Additional parameter names mapped to their new values. It supports configuring nested ‘Period’ objects using the `periods__<name>__<param>` notation.	`{}`

Returns

Name	Type	Description
ConfigMulti	ConfigMulti	The configuration instance with updated parameters (supports method chaining).

Examples

from spotforecast2_safe.configurator.config_multi import ConfigMulti
config = ConfigMulti()
_ = config.set_params(country_code="FR", predict_size=48)
print(f"country_code: {config.country_code}")
print(f"Predict size: {config.predict_size}")
print(f"Random state: {config.random_state}")

# Deep parameter setting
_ = config.set_params(periods__daily__n_periods=24)
print(next(p.n_periods for p in config.periods if p.name == "daily"))

country_code: FR
Predict size: 48
Random state: 314159
24