processing.n2n_predict_with_covariates.n2n_predict_with_covariates

processing.n2n_predict_with_covariates.n2n_predict_with_covariates(
    data=None,
    forecast_horizon=24,
    contamination=0.01,
    window_size=72,
    lags=24,
    train_ratio=0.8,
    latitude=51.5136,
    longitude=7.4653,
    timezone='UTC',
    country_code='DE',
    state='NW',
    estimator=None,
    include_weather_windows=False,
    include_holiday_features=False,
    include_poly_features=False,
    force_train=True,
    model_dir=None,
    verbose=True,
    show_progress=False,
)

End-to-end recursive forecasting with exogenous covariates.

This function implements a complete forecasting pipeline that: 1. Loads and validates target data 2. Detects and removes outliers 3. Imputes missing values with weighted gaps 4. Creates exogenous features (weather, holidays, calendar, day/night) 5. Performs feature engineering (cyclical encoding, interactions) 6. Merges target and exogenous data 7. Splits into train/validation/test sets 8. Trains or loads recursive forecasters with sample weighting 9. Generates multi-step ahead predictions

Models are persisted to disk following scikit-learn conventions using joblib. By default, models are retrained (force_train=True). Set force_train=False to reuse existing cached models.

Parameters

Name	Type	Description	Default
data	Optional[pd.DataFrame]	Optional DataFrame with target time series data. If None, fetches data automatically. Default: None.	`None`
forecast_horizon	int	Number of time steps to forecast ahead. Default: 24.	`24`
contamination	float	Contamination parameter for outlier detection. Default: 0.01.	`0.01`
window_size	int	Rolling window size for gap detection. Default: 72.	`72`
lags	int	Number of lags for recursive forecaster. Default: 24.	`24`
train_ratio	float	Fraction of data for training. Default: 0.8.	`0.8`
latitude	float	Location latitude. Default: 51.5136 (Dortmund).	`51.5136`
longitude	float	Location longitude. Default: 7.4653 (Dortmund).	`7.4653`
timezone	str	Timezone for data. Default: “UTC”.	`'UTC'`
country_code	str	Country code for holidays. Default: “DE”.	`'DE'`
state	str	State code for holidays. Default: “NW”.	`'NW'`
estimator	Optional[object]	Base estimator for recursive forecaster. If None, uses LGBMRegressor. Default: None.	`None`
include_weather_windows	bool	Include weather window features. Default: False.	`False`
include_holiday_features	bool	Include holiday features. Default: False.	`False`
include_poly_features	bool	Include polynomial interaction features. Default: False.	`False`
force_train	bool	Force retraining of all models, ignoring cached models. Default: True.	`True`
model_dir	Optional[Union[str, Path]]	Directory for saving/loading trained models. If None, uses the spotforecast2 cache directory (~/spotforecast2_cache by default, or SPOTFORECAST2_CACHE environment variable). Default: None.	`None`
verbose	bool	Print progress messages. Default: True.	`True`
show_progress	bool	Show progress bar during training. Default: False.	`False`

Returns

Name	Type	Description
	pd.DataFrame	Tuple containing:
	Dict	- predictions: DataFrame with forecast values for each target variable.
	Dict	- metadata: Dictionary with forecast metadata (index, shapes, etc.).
	Tuple[pd.DataFrame, Dict, Dict]	- forecasters: Dictionary of trained ForecasterRecursive objects keyed by target.

Raises

Name	Type	Description
	ValueError	If data validation fails or required data cannot be retrieved.
	ImportError	If required dependencies are not installed.
	OSError	If models cannot be saved to disk.

Examples

import tempfile

from spotforecast2_safe.processing.n2n_predict_with_covariates import (
    n2n_predict_with_covariates,
)

predictions, metadata, forecasters = n2n_predict_with_covariates(
    forecast_horizon=2,
    lags=4,
    window_size=8,
    force_train=True,
    model_dir=tempfile.mkdtemp(),
    verbose=False,
)
print(predictions.shape)

(2, 11)

import tempfile

from spotforecast2_safe.processing.n2n_predict_with_covariates import (
    n2n_predict_with_covariates,
)

predictions, metadata, forecasters = n2n_predict_with_covariates(
    forecast_horizon=3,
    lags=4,
    window_size=8,
    latitude=52.5200,
    longitude=13.4050,
    force_train=True,
    model_dir=tempfile.mkdtemp(),
    verbose=False,
)
print(predictions.shape)
print(metadata["forecast_horizon"])

(3, 11)
3

Notes

The function uses cached weather data when available.
Missing values are handled via forward/backward fill with downweighting observations near gaps.
Sample weights are passed to the forecaster to penalize observations near missing data.
Train/validation splits are temporal (80/20 by default).
All features are cast to float32 for memory efficiency.
Trained models are saved to disk using joblib for fast reuse.
When force_train=False, existing models are loaded and prediction proceeds without retraining. This significantly speeds up prediction for repeated calls with the same configuration.
The model_dir directory is created automatically if it doesn’t exist.
By default, models are cached in ~/spotforecast2_cache, which can be customized via the SPOTFORECAST2_CACHE environment variable.

Performance Notes

First run: Full training
Subsequent runs (force_train=False): Model loading only
Force retrain (force_train=True): Full training again