task_multi: Config-Driven Multi-Target Forecasting with MultiTask

What spotforecast2_safe.multitask provides, how ConfigMulti drives it, and a complete runnable example.

spotforecast2_safe.multitask is the config-driven orchestrator for multi-target time-series forecasting. It owns the complete pipeline — data preparation, outlier handling, imputation, exogenous features, training, prediction, persistence — and is driven by a single ConfigMulti object. The unrestricted sibling package spotforecast2 inherits this pipeline and adds only what is deliberately excluded here: hyperparameter tuning (Optuna, SpotOptim) and interactive plotting.

Before version 16.0.0 this pipeline existed twice: once in the sibling package and once, in procedural form, behind the n-to-1 task’s 18 keyword arguments and a hard-coded weight list. Both paths are now one implementation in this package, and the dependency between the siblings is strictly one-way: spotforecast2 imports from spotforecast2-safe, never the reverse.

Vocabulary

Definition 1 (MultiTask) The task dispatcher of the multitask package. A MultiTask instance is constructed from a ConfigMulti and a DataFrame, prepared with the four pipeline stages (prepare_data, detect_outliers, impute, build_exogenous_features), and executed with run(task=...), where task selects one of the available task modes.

Definition 2 (N-to-1 aggregation) The reduction of per-target forecasts to a single series as a weighted sum, with weights taken from ConfigMulti.agg_weights in target order. Equal weights are used when agg_weights is None.

Task modes

The safe package ships four task modes:

Mode What it does
lazy Fit one ForecasterRecursive per target with LightGBM defaults, applying cached tuning results when present.
defaults Same fit, but ignoring any tuning cache — fully deterministic baseline.
predict Load previously saved models and predict without retraining.
clean Remove the pipeline’s cache directory (models, tuning results, logs).

The tuning modes optuna and spotoptim exist only in spotforecast2; requesting them here raises an explicit ValueError (see Fail-safe behaviour).

A complete worked example

Example 1 (Synthetic data and a minimal configuration) Two hourly target series over four weeks, a named DatetimeIndex matching ConfigMulti.index_name (default "DateTime"), and a configuration with the expensive options disabled so the example runs in seconds and offline.

import tempfile
import warnings

import numpy as np
import pandas as pd

from spotforecast2_safe.configurator.config_multi import ConfigMulti

warnings.filterwarnings("ignore")

rng = np.random.default_rng(0)
n = 24 * 28  # 4 weeks, hourly
idx = pd.date_range("2023-01-01", periods=n, freq="h", tz="UTC")
idx.name = "DateTime"
df = pd.DataFrame(
    {
        "a": 100 + 10 * np.sin(np.arange(n) * 2 * np.pi / 24) + rng.normal(0, 2, n),
        "b": 200 + 20 * np.cos(np.arange(n) * 2 * np.pi / 24) + rng.normal(0, 4, n),
    },
    index=idx,
)

cache = tempfile.mkdtemp()
cfg = ConfigMulti(
    predict_size=6,                 # forecast horizon: 6 hours
    agg_weights=[1.0, -1.0],        # n-to-1 combination: a - b
    use_exogenous_features=False,   # offline example: no weather/calendar
    use_outlier_detection=False,
    auto_save_models=True,          # persist models for the predict mode below
    number_folds=2,
    random_state=42,
    verbose=False,
)
df.tail(3)
a b
DateTime
2023-01-28 21:00:00+00:00 93.591621 219.025208
2023-01-28 22:00:00+00:00 94.698887 220.480717
2023-01-28 23:00:00+00:00 97.692551 224.620083

Example 2 (Running the pipeline) The four stages chain (each returns the task), then run fits and predicts. The returned aggregated package carries the combined forecast under "future_pred"; the per-target packages live in task.results.

from spotforecast2_safe.multitask import MultiTask

mt = MultiTask(cfg, dataframe=df, cache_home=cache)
result = (
    mt.prepare_data()
      .detect_outliers()
      .impute()
      .build_exogenous_features()
      .run(task="defaults")
)
result["future_pred"]
2023-01-29 00:00:00+00:00   -117.612059
2023-01-29 01:00:00+00:00   -120.726869
2023-01-29 02:00:00+00:00   -117.511861
2023-01-29 03:00:00+00:00   -107.689754
2023-01-29 04:00:00+00:00   -103.317560
2023-01-29 05:00:00+00:00    -95.314405
Freq: h, dtype: float64

Example 3 (The aggregation is exactly the configured weighted sum) Definition 2 can be verified directly against the per-target forecasts:

pred_a = mt.results["defaults"]["a"]["future_pred"]
pred_b = mt.results["defaults"]["b"]["future_pred"]
manual = 1.0 * pred_a + (-1.0) * pred_b

print("max |aggregated - manual| =", float((result["future_pred"] - manual).abs().max()))
max |aggregated - manual| = 0.0

Example 4 (Train once, predict many times) With auto_save_models=True the fitted forecasters were persisted under cache_home. A later predict run loads them instead of retraining — the production pattern for scheduled forecasts:

mt2 = MultiTask(cfg, dataframe=df, cache_home=cache)
mt2.prepare_data().detect_outliers().impute().build_exogenous_features()
reloaded = mt2.run(task="predict")
reloaded["future_pred"]
2023-01-29 00:00:00+00:00   -117.612059
2023-01-29 01:00:00+00:00   -120.726869
2023-01-29 02:00:00+00:00   -117.511861
2023-01-29 03:00:00+00:00   -107.689754
2023-01-29 04:00:00+00:00   -103.317560
2023-01-29 05:00:00+00:00    -95.314405
Freq: h, dtype: float64

Example 5 (Determinism) Same input, same configuration, bit-identical output — a hard requirement of this package, enforced by the test suite and demonstrable here with a fresh instance in a fresh cache directory:

mt3 = MultiTask(cfg, dataframe=df.copy(), cache_home=tempfile.mkdtemp())
rerun = (
    mt3.prepare_data()
       .detect_outliers()
       .impute()
       .build_exogenous_features()
       .run(task="defaults")
)
pd.testing.assert_series_equal(result["future_pred"], rerun["future_pred"], check_exact=True)
print("bit-identical:", result["future_pred"].equals(rerun["future_pred"]))
bit-identical: True

Fail-safe behaviour

Invalid requests raise immediately instead of degrading silently. Requesting a tuning mode in the safe package names the package that provides it:

try:
    mt.run(task="spotoptim")
except ValueError as err:
    print(err)
Task 'spotoptim' requires auto-tuning, which is not available in spotforecast2-safe. Use the spotforecast2 package, or task='lazy'/'defaults'.

The same policy applies to unexpected keyword arguments (TypeError instead of silent dropping) and to plotting: MultiTask.plot_with_outliers() raises NotImplementedError because no plotting library is permitted in this package.

The n-to-1 task entry point

run_pipeline from task_safe_n_to_1_with_covariates_and_dataframe wraps exactly this pipeline with task="lazy" — one call from config and DataFrame to combined forecast:

from spotforecast2_safe.tasks.task_safe_n_to_1_with_covariates_and_dataframe import (
    run_pipeline,
)

forecast = run_pipeline(config=cfg, dataframe=df, cache_home=cache)
forecast.head(3)
forecast
2023-01-29 00:00:00+00:00 -117.612059
2023-01-29 01:00:00+00:00 -120.726869
2023-01-29 02:00:00+00:00 -117.511861

The matching console script accepts the same knobs as flags:

uv run spotforecast-safe-n2o1-cov-df --forecast_horizon 24 --lags 24 \
    --include_holiday_features true

Scaling up from the toy example

For a real run, switch the feature machinery on instead of off: use_exogenous_features=True with include_holiday_features, include_holiday_adjacency_features (bridge days), and include_weather_windows adds calendar, holiday, day/night, weather, and polynomial-interaction covariates before training. Weather features require network access; on_weather_failure keeps its fail-safe default "raise" unless you explicitly opt into "skip".

Upgrade path: the same config in spotforecast2

The unrestricted sibling subclasses this pipeline and re-adds tuning and plotting. The configuration object travels unchanged:

# spotforecast2 (not installable here — one-way dependency)
from spotforecast2.multitask import MultiTask

mt = MultiTask(cfg, dataframe=df, task="spotoptim")
mt.prepare_data().detect_outliers().impute().build_exogenous_features()
mt.run(show=True)   # hyperparameter search + interactive figures
Note

The dependency between the packages is strictly one-way: spotforecast2 imports from spotforecast2-safe, never the reverse. That is why the cell above is a listing rather than executed code — this documentation builds in an environment where spotforecast2 is, by design, absent.

Where to go next