processing.n2n_predict.n2n_predict
processing.n2n_predict.n2n_predict(
data=None,
columns=None,
forecast_horizon=24,
contamination=0.01,
window_size=72,
force_train=True,
model_dir=None,
verbose=True,
show_progress=True,
)End-to-end baseline forecasting using equivalent date method.
This function implements a complete forecasting pipeline that: 1. Loads and validates target data 2. Detects and removes outliers 3. Imputes missing values 4. Splits into train/validation/test sets 5. Trains or loads equivalent date forecasters 6. Generates multi-step ahead predictions
Models are persisted to disk following scikit-learn conventions using joblib. By default, models are retrained (force_train=True). Set force_train=False to reuse existing cached models.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| data | Optional[pd.DataFrame] | Optional DataFrame with target time series data. If None, fetches data automatically. Default: None. | None |
| columns | Optional[List[str]] | List of target columns to forecast. If None, uses all available columns. Default: None. | None |
| forecast_horizon | int | Number of time steps to forecast ahead. Default: 24. | 24 |
| contamination | float | Contamination parameter for outlier detection. Default: 0.01. | 0.01 |
| window_size | int | Rolling window size for gap detection. Default: 72. | 72 |
| force_train | bool | Force retraining of all models, ignoring cached models. Default: True. | True |
| model_dir | Optional[Union[str, Path]] | Directory for saving/loading trained models. If None, uses cache directory from get_cache_home(). Default: None (uses ~/spotforecast2_cache/forecasters). | None |
| verbose | bool | Print progress messages. Default: True. | True |
| show_progress | bool | Show progress bar during training and prediction. Default: True. | True |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | Tuple containing: | |
| Dict | - predictions: DataFrame with forecast values for each target variable. | |
| Tuple[pd.DataFrame, Dict] | - forecasters: Dictionary of trained ForecasterEquivalentDate objects keyed by target. |
Raises
| Name | Type | Description |
|---|---|---|
| ValueError | If data validation fails or required data cannot be retrieved. | |
| ImportError | If required dependencies are not installed. | |
| OSError | If models cannot be saved to disk. |
Examples
Basic usage with automatic model caching:
>>> predictions, forecasters = n2n_predict(
... forecast_horizon=24,
... verbose=True
... )
>>> print(predictions.shape)
(24, 11)Load cached models (if available):
>>> predictions, forecasters = n2n_predict(
... forecast_horizon=24,
... force_train=False,
... model_dir="./saved_models",
... verbose=True
... )Force retraining and update cache:
>>> predictions, forecasters = n2n_predict(
... forecast_horizon=24,
... force_train=True,
... model_dir="./saved_models",
... verbose=True
... )With specific target columns:
>>> predictions, forecasters = n2n_predict(
... columns=["power", "energy"],
... forecast_horizon=48,
... force_train=False,
... verbose=True
... )Notes
- Trained models are saved to disk using joblib for fast reuse.
- When force_train=False, existing models are loaded and prediction proceeds without retraining. This significantly speeds up prediction for repeated calls with the same configuration.
- The model_dir directory is created automatically if it doesn’t exist.
- Default model_dir uses get_cache_home() which respects the SPOTFORECAST2_CACHE environment variable.
Performance Notes
- First run: Full training (~2-5 minutes depending on data size)
- Subsequent runs (force_train=False): Model loading only (~1-2 seconds)
- Force retrain (force_train=True): Full training again (~2-5 minutes)