processing.n2n_predict_with_covariates.n2n_predict_with_covariates

processing.n2n_predict_with_covariates.n2n_predict_with_covariates(
    data=None,
    forecast_horizon=24,
    contamination=0.01,
    window_size=72,
    lags=24,
    train_ratio=0.8,
    latitude=51.5136,
    longitude=7.4653,
    timezone='UTC',
    country_code='DE',
    state='NW',
    estimator=None,
    include_weather_windows=False,
    include_holiday_features=False,
    include_poly_features=False,
    force_train=True,
    model_dir=None,
    verbose=True,
    show_progress=False,
)

End-to-end recursive forecasting with exogenous covariates.

This function implements a complete forecasting pipeline that: 1. Loads and validates target data 2. Detects and removes outliers 3. Imputes missing values with weighted gaps 4. Creates exogenous features (weather, holidays, calendar, day/night) 5. Performs feature engineering (cyclical encoding, interactions) 6. Merges target and exogenous data 7. Splits into train/validation/test sets 8. Trains or loads recursive forecasters with sample weighting 9. Generates multi-step ahead predictions

Models are persisted to disk following scikit-learn conventions using joblib. By default, models are retrained (force_train=True). Set force_train=False to reuse existing cached models.

Parameters

Name Type Description Default
data Optional[pd.DataFrame] Optional DataFrame with target time series data. If None, fetches data automatically. Default: None. None
forecast_horizon int Number of time steps to forecast ahead. Default: 24. 24
contamination float Contamination parameter for outlier detection. Default: 0.01. 0.01
window_size int Rolling window size for gap detection. Default: 72. 72
lags int Number of lags for recursive forecaster. Default: 24. 24
train_ratio float Fraction of data for training. Default: 0.8. 0.8
latitude float Location latitude. Default: 51.5136 (Dortmund). 51.5136
longitude float Location longitude. Default: 7.4653 (Dortmund). 7.4653
timezone str Timezone for data. Default: “UTC”. 'UTC'
country_code str Country code for holidays. Default: “DE”. 'DE'
state str State code for holidays. Default: “NW”. 'NW'
estimator Optional[object] Base estimator for recursive forecaster. If None, uses LGBMRegressor. Default: None. None
include_weather_windows bool Include weather window features. Default: False. False
include_holiday_features bool Include holiday features. Default: False. False
include_poly_features bool Include polynomial interaction features. Default: False. False
force_train bool Force retraining of all models, ignoring cached models. Default: True. True
model_dir Optional[Union[str, Path]] Directory for saving/loading trained models. If None, uses the spotforecast2 cache directory (~/spotforecast2_cache by default, or SPOTFORECAST2_CACHE environment variable). Default: None. None
verbose bool Print progress messages. Default: True. True
show_progress bool Show progress bar during training. Default: False. False

Returns

Name Type Description
pd.DataFrame Tuple containing:
Dict - predictions: DataFrame with forecast values for each target variable.
Dict - metadata: Dictionary with forecast metadata (index, shapes, etc.).
Tuple[pd.DataFrame, Dict, Dict] - forecasters: Dictionary of trained ForecasterRecursive objects keyed by target.

Raises

Name Type Description
ValueError If data validation fails or required data cannot be retrieved.
ImportError If required dependencies are not installed.
OSError If models cannot be saved to disk.

Examples

Basic usage with automatic model caching:

>>> predictions, metadata, forecasters = n2n_predict_with_covariates(
...     forecast_horizon=24,
...     verbose=True
... )
>>> print(predictions.shape)
(24, 11)

Load cached models (if available):

>>> predictions, metadata, forecasters = n2n_predict_with_covariates(
...     forecast_horizon=24,
...     force_train=False,
...     model_dir="./saved_models"
... )

Force retraining and update cache:

>>> predictions, metadata, forecasters = n2n_predict_with_covariates(
...     forecast_horizon=24,
...     force_train=True,
...     model_dir="./saved_models"
... )

Custom location and features:

>>> predictions, metadata, forecasters = n2n_predict_with_covariates(
...     forecast_horizon=48,
...     latitude=52.5200,  # Berlin
...     longitude=13.4050,
...     lags=48,
...     include_poly_features=True,
...     force_train=False,
...     verbose=True
... )

Notes

  • The function uses cached weather data when available.
  • Missing values are handled via forward/backward fill with downweighting observations near gaps.
  • Sample weights are passed to the forecaster to penalize observations near missing data.
  • Train/validation splits are temporal (80/20 by default).
  • All features are cast to float32 for memory efficiency.
  • Trained models are saved to disk using joblib for fast reuse.
  • When force_train=False, existing models are loaded and prediction proceeds without retraining. This significantly speeds up prediction for repeated calls with the same configuration.
  • The model_dir directory is created automatically if it doesn’t exist.
  • By default, models are cached in ~/spotforecast2_cache, which can be customized via the SPOTFORECAST2_CACHE environment variable.

Performance Notes

  • First run: Full training
  • Subsequent runs (force_train=False): Model loading only
  • Force retrain (force_train=True): Full training again