End-to-end recursive forecasting with exogenous covariates.
This function implements a complete forecasting pipeline that: 1. Loads and validates target data 2. Detects and removes outliers 3. Imputes missing values with weighted gaps 4. Creates exogenous features (weather, holidays, calendar, day/night) 5. Performs feature engineering (cyclical encoding, interactions) 6. Merges target and exogenous data 7. Splits into train/validation/test sets 8. Trains or loads recursive forecasters with sample weighting 9. Generates multi-step ahead predictions
Models are persisted to disk following scikit-learn conventions using joblib. By default, models are retrained (force_train=True). Set force_train=False to reuse existing cached models.
Directory for saving/loading trained models. If None, uses the spotforecast2 cache directory (~/spotforecast2_cache by default, or SPOTFORECAST2_CACHE environment variable). Default: None.
The function uses cached weather data when available.
Missing values are handled via forward/backward fill with downweighting observations near gaps.
Sample weights are passed to the forecaster to penalize observations near missing data.
Train/validation splits are temporal (80/20 by default).
All features are cast to float32 for memory efficiency.
Trained models are saved to disk using joblib for fast reuse.
When force_train=False, existing models are loaded and prediction proceeds without retraining. This significantly speeds up prediction for repeated calls with the same configuration.
The model_dir directory is created automatically if it doesn’t exist.
By default, models are cached in ~/spotforecast2_cache, which can be customized via the SPOTFORECAST2_CACHE environment variable.
Performance Notes
First run: Full training
Subsequent runs (force_train=False): Model loading only
Force retrain (force_train=True): Full training again