processing.n2n_predict_with_covariates.n2n_predict_with_covariates

processing.n2n_predict_with_covariates.n2n_predict_with_covariates(
    data=None,
    forecast_horizon=24,
    contamination=0.01,
    window_size=72,
    lags=24,
    train_ratio=0.8,
    latitude=51.5136,
    longitude=7.4653,
    timezone='UTC',
    country_code='DE',
    state='NW',
    estimator=None,
    include_weather_windows=False,
    include_holiday_features=False,
    include_holiday_adjacency_features=False,
    poly_features_degree=1,
    max_poly_features=10,
    force_train=True,
    model_dir=None,
    verbose=True,
    show_progress=False,
    on_weather_failure='raise',
)

End-to-end recursive forecasting with exogenous covariates.

This function implements a complete forecasting pipeline that: 1. Loads and validates target data 2. Detects and removes outliers 3. Imputes missing values with weighted gaps 4. Creates exogenous features (weather, holidays, calendar, day/night) 5. Performs feature engineering (cyclical encoding, interactions) 6. Merges target and exogenous data 7. Splits into train/validation/test sets 8. Trains or loads recursive forecasters with sample weighting 9. Generates multi-step ahead predictions

Models are persisted to disk following scikit-learn conventions using joblib. By default, models are retrained (force_train=True). Set force_train=False to reuse existing cached models.

Parameters

Name Type Description Default
data Optional[pd.DataFrame] Optional DataFrame with target time series data. If None, fetches data automatically. Default: None. None
forecast_horizon int Number of time steps to forecast ahead. Default: 24. 24
contamination float Contamination parameter for outlier detection. Default: 0.01. 0.01
window_size int Rolling window size for gap detection. Default: 72. 72
lags int Number of lags for recursive forecaster. Default: 24. 24
train_ratio float Fraction of data for training. Default: 0.8. 0.8
latitude float Location latitude. Default: 51.5136 (Dortmund). 51.5136
longitude float Location longitude. Default: 7.4653 (Dortmund). 7.4653
timezone str Timezone for data. Default: “UTC”. 'UTC'
country_code str Country code for holidays. Default: “DE”. 'DE'
state str State code for holidays. Default: “NW”. 'NW'
estimator Optional[object] Base estimator for recursive forecaster. If None, uses LGBMRegressor. Default: None. None
include_weather_windows bool Include weather window features. Default: False. False
include_holiday_features bool Include holiday features. Default: False. False
include_holiday_adjacency_features bool Include Brückentag and before/after-holiday binary indicators (is_brueckentag, is_before_holiday, is_after_holiday). When False (default), behaviour is byte-identical to today. False
poly_features_degree int Polynomial-interaction degree. 1 (default) = no interactions; 2 = pairwise bilinear; 3+ = higher order. 1
max_poly_features int Cap on kept polynomial interaction columns; only the top-K ranked by mutual information with the target survive (<= 0 disables). Default: 10. 10
force_train bool Force retraining of all models, ignoring cached models. Default: True. True
model_dir Optional[Union[str, Path]] Directory for saving/loading trained models. If None, uses the spotforecast2 cache directory (~/spotforecast2_cache by default, or SPOTFORECAST2_CACHE environment variable). Default: None. None
verbose bool Print progress messages. Default: True. True
show_progress bool Show progress bar during training. Default: False. False
on_weather_failure Literal['raise', 'skip'] Policy for handling Open-Meteo fetch failures. "raise" (default) propagates WeatherFetchError and aborts the pipeline — preserves the safety-critical fail-safe semantics matching ConfigEntsoe.on_weather_failure. "skip" logs a warning and continues with empty weather features so the rest of the pipeline (calendar, holidays, day/night) can run without the Open-Meteo dependency. Pass "skip" from offline / CI environments and docstring examples that must remain network-resilient. 'raise'

Returns

Name Type Description
pd.DataFrame Tuple containing:
Dict - predictions: DataFrame with forecast values for each target variable.
Dict - metadata: Dictionary with forecast metadata (index, shapes, etc.).
Tuple[pd.DataFrame, Dict, Dict] - forecasters: Dictionary of trained ForecasterRecursive objects keyed by target.

Raises

Name Type Description
ValueError If data validation fails or required data cannot be retrieved.
ImportError If required dependencies are not installed.
OSError If models cannot be saved to disk.

Examples

import tempfile

from spotforecast2_safe.processing.n2n_predict_with_covariates import (
    n2n_predict_with_covariates,
)

predictions, metadata, forecasters = n2n_predict_with_covariates(
    forecast_horizon=2,
    lags=4,
    window_size=8,
    force_train=True,
    model_dir=tempfile.mkdtemp(),
    verbose=False,
    on_weather_failure="skip",
)
print(predictions.shape)
(2, 11)
import tempfile

from spotforecast2_safe.processing.n2n_predict_with_covariates import (
    n2n_predict_with_covariates,
)

predictions, metadata, forecasters = n2n_predict_with_covariates(
    forecast_horizon=3,
    lags=4,
    window_size=8,
    latitude=52.5200,
    longitude=13.4050,
    force_train=True,
    model_dir=tempfile.mkdtemp(),
    verbose=False,
    on_weather_failure="skip",
)
print(predictions.shape)
print(metadata["forecast_horizon"])
(3, 11)
3

Notes

  • The function uses cached weather data when available.
  • Missing values are handled via forward/backward fill with downweighting observations near gaps.
  • Sample weights are passed to the forecaster to penalize observations near missing data.
  • Train/validation splits are temporal (80/20 by default).
  • All features are cast to float32 for memory efficiency.
  • Trained models are saved to disk using joblib for fast reuse.
  • When force_train=False, existing models are loaded and prediction proceeds without retraining. This significantly speeds up prediction for repeated calls with the same configuration.
  • The model_dir directory is created automatically if it doesn’t exist.
  • By default, models are cached in ~/spotforecast2_cache, which can be customized via the SPOTFORECAST2_CACHE environment variable.

Performance Notes

  • First run: Full training
  • Subsequent runs (force_train=False): Model loading only
  • Force retrain (force_train=True): Full training again